BibTeX to Jekyll Markdown Converter
A Python tool that converts BibTeX entries into Jekyll collection markdown files with duplicate detection and customizable formatting.
Features
- Duplicate Detection: Automatically detects and handles duplicate publications based on DOI, title similarity, and author+year combinations
- Clean Formatting: Removes HTML tags and LaTeX artifacts from abstracts
- Jekyll-Ready: Creates properly formatted markdown files with YAML frontmatter for Jekyll collections
- URL-Friendly Slugs: Generates clean filenames based on author, title keywords, and publication year
- Flexible Output: Supports dry-run mode and custom output directories
Installation
This tool requires Python 3.6+ and uses the bibtexparser
library.
# Initialize project (if starting fresh)
uv init bib_converter
cd bib_converter
# Install dependencies
uv add bibtexparser
Usage
Basic Usage
# Convert BibTeX file to markdown files
uv run python bib_to_markdown.py input.bib
# Specify custom output directory
uv run python bib_to_markdown.py input.bib _my_publications
Options
--dry-run
: Preview what would be created without actually creating files--force
,-f
: Overwrite existing files without asking--help
,-h
: Show help message
Examples
# Preview conversion
uv run python bib_to_markdown.py publications.bib --dry-run
# Convert with automatic overwrite
uv run python bib_to_markdown.py publications.bib _publications --force
# Convert to custom directory
uv run python bib_to_markdown.py my_pubs.bib ../my_site/_publications
Output Format
Each publication is converted to a markdown file with:
YAML Frontmatter
title
: Publication title (cleaned of BibTeX formatting)authors
: Formatted author listyear
: Publication yearjournal
: Journal name (if available)volume
,issue
,pages
: Citation details (if available)doi
: DOI identifier (if available)url
: Additional URLs (if available)keywords
: Keywords as YAML list (if available)layout
: Set to “publication” for Jekyll
Content Sections
- Abstract: Cleaned abstract text (if available)
- Citation: Properly formatted citation
- Links: DOI and additional URLs
Duplicate Detection
The tool detects duplicates using:
- Exact DOI match: Publications with identical DOI values
- Title similarity: Publications with identical normalized titles
- Author+Year+Title: Same authors, year, and similar titles (>80% word overlap)
When duplicates are found, the first entry in each group is kept, and others are filtered out.
File Naming
Files are named using the pattern: {author}-{title-keywords}-{year}.md
Where:
{author}
: Last name of first author (cleaned){title-keywords}
: First 4 words of title (URL-friendly){year}
: Publication year
Example: obrien-atmospheric-river-detection-2020.md
Jekyll Integration
To use with Jekyll:
- Add publications collection to
_config.yml
:collections: publications: output: true permalink: /:collection/:title/
-
Create a publication layout (
_layouts/publication.html
) - Create a publications listing page that iterates through
site.publications
Supported BibTeX Fields
The converter recognizes and processes these BibTeX fields:
title
,author
,year
,journal
volume
,issue
/number
,pages
doi
,url
,abstract
,keywords
Error Handling
- Missing input files are detected and reported
- BibTeX parsing errors are caught and reported
- File writing errors are handled gracefully
- User prompts for overwriting existing files (unless
--force
is used)
License
This tool was created for academic use. Feel free to modify and distribute as needed.