← ClaudeAtlas

vision-ocrlisted

Use whenever a hand-written or scanned answer PDF needs transcription to markdown for /grade. Three tiers — Claude native vision (default, no extra install), local Qwen3-VL 8B via ollama (opt-in privacy mode), pytesseract fallback. The engine is selected via `OCR_ENGINE` in `.course-meta` (written by /paideia:init-course) and can be overridden per-call with `/paideia:grade --ocr=<engine>`.
OPTIMETA/PAIDEIA · ★ 81 · Data & Documents · score 86
Install: claude install-skill OPTIMETA/PAIDEIA
# Vision-OCR ## When to load - `/grade` needs to convert `answers/*.pdf` → `answers/converted/*.md` - Any hand-written / scanned document whose previous tesseract pass was garbled - `answer-processing` skill's step-2 conversion ## Engine choice `.course-meta` holds a single line `OCR_ENGINE: <engine>` written by `/paideia:init-course`. The grade command reads it and dispatches. Users can override per-call with `/paideia:grade --ocr=<engine> [path]`. | Engine | Default? | How it runs | When to pick it | |---|---|---|---| | `claude` | **Yes** | `pdftoppm` → Claude reads each PNG via the Read tool → synthesizes markdown inline. No external model. No subprocess. | The out-of-the-box path. Nothing to install. Highest fidelity on messy handwriting because Claude vision handles mixed-script (English/Korean) prose with LaTeX well. | | `ollama` | opt-in | `python3 ${CLAUDE_PLUGIN_ROOT}/scripts/vision_ocr.py --engine=ollama <pdf> <md>` — local Qwen3-VL 8B, with an automatic tesseract fall-back if ollama is unreachable. Reads `INTERFACE_LANG` from `.course-meta` to set the prose-language rule. | You want the PDF to never leave the machine *and* you don't want to burn Claude tokens on OCR. Requires one-time `ollama pull qwen3-vl:8b` (~6 GB). | | `tesseract` | opt-in | `python3 ${CLAUDE_PLUGIN_ROOT}/scripts/vision_ocr.py --engine=tesseract <pdf> <md>` — pytesseract (`eng` for en, `eng+kor` for ko, derived from `.course-meta`). | Zero cloud + no GPU/VRAM budget. Lowest fidelity on han