pdf-to-mdlisted
Install: claude install-skill fmschulz/omics-skills
# pdf-to-md
Turn a PDF into Markdown. The right path depends on **what the document is** and
**whether an OCR API key is available**:
- **Scientific paper** → produce the canonical `paper-to-md` bundle (Markdown +
`section_audit.json` + `article.json`) so it can feed `csag-extraction`.
Convert with the **OCR API** when a key is set, else **LiteParse v2** locally.
- **Any other PDF** (reports, slides, letters, forms) → just convert to Markdown
with **LiteParse v2** for a fast, local, no-key result. Stop there.
**LiteParse must be v2** ([run-llama/liteparse](https://github.com/run-llama/liteparse),
the Rust rewrite with the `LiteParse` Python API and `lit` CLI). LiteParse v1 is a
different, unsupported API. `liteparse_to_md.py` pins `liteparse>=2,<3` and refuses
to run on anything else, so `uv run` always provisions the right per-platform v2
binary inside the wheel — nothing to vendor or compile, and no API key. OCR is on by
default (bundled Tesseract).
**LiteParse output is a draft, not the deliverable.** LiteParse is a *mechanical*
parser: it has no native Markdown, infers headings from font size/weight, and
introduces artifacts (split words, broken hyphenation, dropped author blocks, merged
columns). Whenever LiteParse is the engine, the LLM running this skill is responsible
for shaping that draft into the right form — see "Shape the LiteParse output" below.
The OCR API engine needs far less shaping.
## Instructions
### Step 0 — Classify the document and pick a p