pdf

Solid

Use whenever the user works with PDF files — reading/extracting text from PDFs (lecture notes, textbook chapters, HW problems, HW solutions, hand-written answers), converting PDFs to markdown for downstream analysis, merging/splitting PDFs, or creating PDFs. For scanned or hand-written PDFs, OCR is required (pytesseract + pdf2image). Based on Anthropic's official PDF skill (github.com/anthropics/skills/tree/main/skills/pdf).

Data & Documents 81 stars 2 forks Updated today MIT

Install

View on GitHub

Quality Score: 89/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# PDF Processing Guide ## When to use this skill Load this skill whenever the workflow involves PDF input or output. In the paideia context specifically: - Converting `materials/**/*.pdf` to markdown in `converted/**/*.md` (via `/ingest`) - Converting hand-written answer PDFs in `answers/*.pdf` to markdown in `answers/converted/*.md` (via `/grade`) - OCR for scanned lecture notes, textbook chapters, or hand-written work ## Quick decision tree ``` What kind of PDF? ├─ Course material (materials/**/*.pdf) → VISION pipeline (see VISION.md) │ pdfplumber is unreliable on course │ content — even "prose-heavy" │ textbook pages mix in equations, │ figures, and multi-column layouts │ that break digital extraction │ silently. We route everything │ through vision instead of │ maintaining a per-category heuristic. ├─ Hand-written answer PDF → vision-ocr skill (see vision-ocr/) └─ Arbitrary outside-the-plugin PDF → pdfplumber / pypdf / pytesseract per the sections below, case-by-case ``` Within this plugin, `/paideia:ingest` routes **all** `materials/**/*.pdf` through the vision pipeline. The `pdfplumbe...

Details

Author: OPTIMETA
Repository: OPTIMETA/PAIDEIA
Created: 1 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Anthropic · AI Ollama · AI

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Listed

pdf

Use when tasks involve reading, creating, or reviewing PDF files where rendering and layout matter; prefer visual checks by rendering pages (Poppler) and use Python tools such as `reportlab`, `pdfplumber`, and `pypdf` for generation and extraction.

1 Updated today

bg-szy

AI & Automation Listed

pdf

Create new PDFs and handle existing `.pdf` files safely with bundled Node/JS tools, including text extraction, page rendering, invoice/document parsing, form filling, and overlays.

113 Updated today

HybridAIOne

Data & Documents Listed

pdf-processing

Extract text and tables from PDF files, fill forms, merge documents. Use when working with PDF files or when the user mentions PDFs, forms, or document extraction.

353 Updated today

aiskillstore