oma-pdf

Solid

Convert PDF files to Markdown using opendataloader-pdf. Extracts text, tables, headings, lists, and images with correct reading order. Use for PDF parsing, PDF to Markdown conversion, document extraction, and AI-ready data preparation.

Data & Documents 1,042 stars 119 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# PDF Skill - PDF to Markdown Conversion ## Scheduling ### Goal Convert PDF files into structured Markdown or another requested extraction format while preserving readable document structure for LLM context, RAG, or downstream review. ### Intent signature - User asks to convert, parse, read, extract, or transform a PDF. - User needs PDF text, headings, lists, tables, or images prepared for AI consumption. - User mentions "PDF to markdown", "parse PDF", "read this PDF", or equivalent wording. ### When to use - Converting PDF documents to Markdown for LLM context or RAG - Extracting structured content such as tables, headings, lists, images, footnotes, or hyperlinks - Preparing PDF data for AI consumption - Checking whether a PDF has a text layer before choosing OCR ### When NOT to use - Generating or creating PDFs -> use document-generation tools - Editing existing PDFs -> out of scope - Reading an already-text file -> use direct file reading - Processing HWP, HWPX, DOCX, XLSX, or slide decks -> use the matching document skill ### Expected inputs - `input_path`: PDF file or folder path - `output_dir`: optional target directory - `format`: optional output format, default `markdown` - `ocr_languages`: optional OCR language list for scanned or image-based PDFs - `extraction_options`: optional flags for tagged structure, image extraction, or hybrid conversion ### Expected outputs - Markdown, text, JSON, HTML, or combined extraction output - Normalized Markdown when Markdown...

Details

Author
first-fluke
Repository
first-fluke/oh-my-agent
Created
4 months ago
Last Updated
today
Language
TypeScript
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Listed

pdf-to-md

Convert any PDF (or DOCX/PPTX/XLSX/image) to clean Markdown. For scientific papers, produce the canonical paper-to-md bundle (Markdown plus section_audit.json and article.json) using the remote OCR API when an OCR key is available, or LiteParse v2 locally when it is not. For any non-paper PDF, defer to a fast, local, no-API-key LiteParse v2 conversion. Use when turning a PDF or manuscript into Markdown, extracting article structure, or preparing input for csag-extraction.

3 Updated yesterday
fmschulz
Data & Documents Listed

office-to-md

Convert Office documents (Word, Excel, PowerPoint, PDF) to Markdown format. ONLY use this skill when the user explicitly requests to CONVERT, TRANSFORM or PARSE a specific office file into Markdown. Do NOT trigger for general questions, documentation reading, or discussions about files.

0 Updated today
Azistoteles
Data & Documents Solid

ocr-and-documents

Extract text from PDFs and scanned documents. Use web_extract for remote URLs, pymupdf for local text-based PDFs, marker-pdf for OCR/scanned docs. For DOCX use python-docx, for PPTX see the powerpoint skill.

175,435 Updated today
NousResearch
Data & Documents Listed

pdf-folder-to-markdown

Bulk-convert every PDF in a folder (and its subfolders) to Markdown using a generated Python script, and optionally generate a companion script to delete the original PDFs afterwards. Triggers on requests like 'convert all PDFs in this folder to markdown', 'bulk PDF to MD', 'extract text from every PDF in <folder>', 'make markdown files from these PDFs', 'turn all PDFs into .md', or follow-ons like 'now delete the PDFs' or 'clean up non-markdown files'. Use whenever the user wants to process a folder full of PDFs (often with per-item subfolders) into Markdown.

0 Updated today
palych65
Data & Documents Solid

oma-hwp

Convert HWP / HWPX / HWPML files to Markdown using kordoc. Extracts text, headings, tables, lists, images, footnotes, and hyperlinks. Use for Korean word processor files (Hangul), government documents, and AI-ready data preparation.

1,042 Updated today
first-fluke