← ClaudeAtlas

wiki_ingest_ocrlisted

Ingest new academic papers or PDFs into the raw/ folder of your active topic wiki using the local OCR model configured in config.yaml.
Misaka16384/Wikify · ★ 3 · Data & Documents · score 69
Install: claude install-skill Misaka16384/Wikify
# LLM Wiki — Ingest Local OCR Skill (wiki_ingest_ocr) > **Resolving script paths (read first):** Commands below invoke scripts as `<BIN>/X.py` (and a few as `<SKILLS>/...`). Resolve these to **absolute paths once** before running anything: > > - `<SKILL_DIR>` = the directory this `SKILL.md` lives in. > - `<SKILLS>` = the `skills/` folder containing this skill = `<SKILL_DIR>/..` > - `<BIN>` = the `bin/` folder beside it = `<SKILL_DIR>/../../bin` > > Do **not** hardcode a fixed prefix like `.agents/bin` or `../bin`: shell relative paths resolve against the current working directory (usually the topic root), not this skill's location. Once resolved, `<BIN>` is typically `.agents/bin` when invoked from the hub root, or `.claude/bin` from inside a topic directory. This skill handles converting external PDF documents (especially academic papers or scanned articles inside `inbox/` or custom local paths) into high-fidelity clean Markdown using the local OCR model configured in `config.yaml` (default: `glm-ocr` at 130 DPI). > **Figures are handled automatically.** Both the PDF path (`pdf2md-agent`) and the TeX path (`tex2md.py`) extract figures into an `images/` folder beside the output Markdown and embed them inline (`![caption](images/<slug>-...png)`). Figure files are prefixed with the document slug, so multiple papers can share one `raw/<type>/images/` folder without collisions. Vector figures and `.pdf`/`.eps` sources are rasterised to PNG. You do **not** need to handle figure