extract-from-pdfslisted

This skill should be used when extracting structured data from scientific PDFs for systematic reviews, meta-analyses, or database creation. Use when working with collections of research papers that need to be converted into analyzable datasets with validation metrics.
aiskillstore/marketplace · ★ 329 · Data & Documents · score 79

Install: claude install-skill aiskillstore/marketplace

# Extract Structured Data from Scientific PDFs ## Purpose Extract standardized, structured data from scientific PDF literature using Claude's vision capabilities. Transform PDF collections into validated databases ready for statistical analysis in Python, R, or other frameworks. **Core capabilities:** - Organize metadata from BibTeX, RIS, directories, or DOI lists - Filter papers by abstract using Claude (Haiku/Sonnet) or local models (Ollama) - Extract structured data from PDFs with customizable schemas - Repair and validate JSON outputs automatically - Enrich with external databases (GBIF, WFO, GeoNames, PubChem, NCBI) - Calculate precision/recall metrics for quality assurance - Export to Python, R, CSV, Excel, or SQLite ## When to Use This Skill Use when: - Conducting systematic literature reviews requiring data extraction - Building databases from scientific publications - Converting PDF collections to structured datasets - Validating extraction quality with ground truth metrics - Comparing extraction approaches (different models, prompts) Do not use for: - Single PDF summarization (use basic PDF reading instead) - Full-text PDF search (use document search tools) - PDF editing or manipulation ## Getting Started ### 1. Initial Setup Read the setup guide for installation and configuration: ```bash cat references/setup_guide.md ``` Key setup steps: - Install dependencies: `conda env create -f environment.yml` - Set API keys: `export ANTHROPIC_API_KEY='your-key'` -