extract-from-pdfslisted
Install: claude install-skill aiskillstore/marketplace
# Extract Structured Data from Scientific PDFs
## Purpose
Extract standardized, structured data from scientific PDF literature using Claude's vision capabilities. Transform PDF collections into validated databases ready for statistical analysis in Python, R, or other frameworks.
**Core capabilities:**
- Organize metadata from BibTeX, RIS, directories, or DOI lists
- Filter papers by abstract using Claude (Haiku/Sonnet) or local models (Ollama)
- Extract structured data from PDFs with customizable schemas
- Repair and validate JSON outputs automatically
- Enrich with external databases (GBIF, WFO, GeoNames, PubChem, NCBI)
- Calculate precision/recall metrics for quality assurance
- Export to Python, R, CSV, Excel, or SQLite
## When to Use This Skill
Use when:
- Conducting systematic literature reviews requiring data extraction
- Building databases from scientific publications
- Converting PDF collections to structured datasets
- Validating extraction quality with ground truth metrics
- Comparing extraction approaches (different models, prompts)
Do not use for:
- Single PDF summarization (use basic PDF reading instead)
- Full-text PDF search (use document search tools)
- PDF editing or manipulation
## Getting Started
### 1. Initial Setup
Read the setup guide for installation and configuration:
```bash
cat references/setup_guide.md
```
Key setup steps:
- Install dependencies: `conda env create -f environment.yml`
- Set API keys: `export ANTHROPIC_API_KEY='your-key'`
-