pdf

Solid

Use whenever the user works with PDF files — reading/extracting text from PDFs (lecture notes, textbook chapters, HW problems, HW solutions, hand-written answers), converting PDFs to markdown for downstream analysis, merging/splitting PDFs, or creating PDFs. For scanned or hand-written PDFs, OCR is required (pytesseract + pdf2image). Based on Anthropic's official PDF skill (github.com/anthropics/skills/tree/main/skills/pdf).

Data & Documents 81 stars 2 forks Updated today MIT

Install

View on GitHub

Quality Score: 89/100

Stars 20%
64
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# PDF Processing Guide ## When to use this skill Load this skill whenever the workflow involves PDF input or output. In the paideia context specifically: - Converting `materials/**/*.pdf` to markdown in `converted/**/*.md` (via `/ingest`) - Converting hand-written answer PDFs in `answers/*.pdf` to markdown in `answers/converted/*.md` (via `/grade`) - OCR for scanned lecture notes, textbook chapters, or hand-written work ## Quick decision tree ``` What kind of PDF? ├─ Course material (materials/**/*.pdf) → VISION pipeline (see VISION.md) │ pdfplumber is unreliable on course │ content — even "prose-heavy" │ textbook pages mix in equations, │ figures, and multi-column layouts │ that break digital extraction │ silently. We route everything │ through vision instead of │ maintaining a per-category heuristic. ├─ Hand-written answer PDF → vision-ocr skill (see vision-ocr/) └─ Arbitrary outside-the-plugin PDF → pdfplumber / pypdf / pytesseract per the sections below, case-by-case ``` Within this plugin, `/paideia:ingest` routes **all** `materials/**/*.pdf` through the vision pipeline. The `pdfplumbe...

Details

Author
OPTIMETA
Repository
OPTIMETA/PAIDEIA
Created
1 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category