oma-pdf

Featured

Convert PDF files to Markdown using opendataloader-pdf. Extracts text, tables, headings, lists, and images with correct reading order. Use for PDF parsing, PDF to Markdown conversion, document extraction, and AI-ready data preparation.

Data & Documents 1,195 stars 136 forks Updated today MIT

Install

View on GitHub

Quality Score: 92/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# PDF Skill - PDF to Markdown Conversion ## Scheduling ### Goal Convert PDF files into structured Markdown or another requested extraction format while preserving readable document structure for LLM context, RAG, or downstream review. ### Intent signature - User asks to convert, parse, read, extract, or transform a PDF. - User needs PDF text, headings, lists, tables, or images prepared for AI consumption. - User mentions "PDF to markdown", "parse PDF", "read this PDF", or equivalent wording. ### When to use - Converting PDF documents to Markdown for LLM context or RAG - Extracting structured content such as tables, headings, lists, images, footnotes, or hyperlinks - Preparing PDF data for AI consumption - Checking whether a PDF has a text layer before choosing OCR ### When NOT to use - Generating or creating PDFs -> use document-generation tools - Editing existing PDFs -> out of scope - Reading an already-text file -> use direct file reading - Processing HWP, HWPX, DOCX, XLSX, or slide decks -> use the matching document skill ### Expected inputs - `input_path`: PDF file or folder path - `output_dir`: optional target directory - `format`: optional output format, default `markdown` - `ocr_languages`: optional OCR language list for scanned or image-based PDFs - `extraction_options`: optional flags for tagged structure, image extraction, or hybrid conversion ### Expected outputs - Markdown, text, JSON, HTML, or combined extraction output - Normalized Markdown when Markdown...

Details

Author: first-fluke
Repository: first-fluke/oh-my-agent
Created: 5 months ago
Last Updated: today
Language: TypeScript
License: MIT

Bundled in these plugins

oma

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Solid

pdf

Use this skill whenever the user wants to do anything with PDF files. This includes reading or extracting text/tables/formulas from PDFs (including scanned and photographed documents), OCR, combining or merging multiple PDFs, splitting, rotating, watermarking, creating new PDFs, encrypting/decrypting, and extracting images. Routes between pypdf (fast born-digital text), pdfplumber (tables), opendataloader-pdf (born-digital complex layout with optional Docling hybrid), and LightOnOCR-2-1B (vision-LM OCR for scanned/photographed docs; dolphin v2 fallback). If the user mentions a .pdf file or asks to produce one, use this skill.

24 Updated 4 days ago

kennethkhoocy

Data & Documents Listed

pdflux-pdf2markdown

Convert unstructured documents into LLM-ready structured data. Supports PDF, Word, PPT, and images; extracts paragraphs, formulas, tables, charts, and other elements in one step; generates up to 8 levels of headings; and outputs Markdown organized in reading order. Useful for field extraction, comparison and validation, knowledge retrieval, and intelligent Q&A.

16 Updated 3 days ago

PaodingAI

Data & Documents Listed

md-to-pdf

Convert Markdown to PDF via reportlab or weasyprint engines. Triggers - pdf, md to pdf, markdown to pdf, generate pdf.

29 Updated 3 days ago

kochetkov-ma