pdf

Solid

Process PDF files - extract text, create PDFs, merge documents. Use when user asks to read PDF, create PDF, or work with PDF files.

Data & Documents 66,350 stars 10813 forks Updated 6 days ago MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# PDF Processing Skill You now have expertise in PDF manipulation. Follow these workflows: ## Reading PDFs **Option 1: Quick text extraction (preferred)** ```bash # Using pdftotext (poppler-utils) pdftotext input.pdf - # Output to stdout pdftotext input.pdf output.txt # Output to file # If pdftotext not available, try: python3 -c " import fitz # PyMuPDF doc = fitz.open('input.pdf') for page in doc: print(page.get_text()) " ``` **Option 2: Page-by-page with metadata** ```python import fitz # pip install pymupdf doc = fitz.open("input.pdf") print(f"Pages: {len(doc)}") print(f"Metadata: {doc.metadata}") for i, page in enumerate(doc): text = page.get_text() print(f"--- Page {i+1} ---") print(text) ``` ## Creating PDFs **Option 1: From Markdown (recommended)** ```bash # Using pandoc pandoc input.md -o output.pdf # With custom styling pandoc input.md -o output.pdf --pdf-engine=xelatex -V geometry:margin=1in ``` **Option 2: Programmatically** ```python from reportlab.lib.pagesizes import letter from reportlab.pdfgen import canvas c = canvas.Canvas("output.pdf", pagesize=letter) c.drawString(100, 750, "Hello, PDF!") c.save() ``` **Option 3: From HTML** ```bash # Using wkhtmltopdf wkhtmltopdf input.html output.pdf # Or with Python python3 -c " import pdfkit pdfkit.from_file('input.html', 'output.pdf') " ``` ## Merging PDFs ```python import fitz result = fitz.open() for pdf_path in ["file1.pdf", "file2.pdf", "file3.pdf"]: doc = fitz.open(pdf_pat...

Details

Author
shareAI-lab
Repository
shareAI-lab/learn-claude-code
Created
11 months ago
Last Updated
6 days ago
Language
Python
License
MIT

Similar Skills

Semantically similar based on skill content — not just same category