← ClaudeAtlas

fulltext-retrievallisted

Batch download open-access PDFs by DOI using legitimate OA APIs (Unpaywall, PMC, OpenAlex, Crossref). Optional PDF→Markdown conversion for token-efficient LLM analysis.
Aperivue/medsci-skills · ★ 126 · Data & Documents · score 82
Install: claude install-skill Aperivue/medsci-skills
# Fulltext Retrieval Skill Batch download open-access full-text PDFs from a DOI list using legitimate OA APIs only. ## Pipeline ``` DOI list → Unpaywall → PMC (Europe PMC / OA FTP / web) → OpenAlex → Crossref → landing page ``` Each DOI goes through these sources in order until a valid PDF (≥10 KB, `%PDF-` header) is found. ## Quick Start ```bash # Prepare a DOI list (one per line) cat > dois.txt << 'EOF' 10.1007/s00330-010-1783-x 10.1002/mp.12524 10.1148/radiol.13131265 EOF # Run python fetch_oa.py dois.txt --output pdfs/ --email your@email.com # Verbose mode for debugging python fetch_oa.py dois.txt -o pdfs/ -e your@email.com --verbose ``` ## Input Formats **Plain text** — one DOI per line: ``` 10.1007/s00330-010-1783-x 10.1002/mp.12524 ``` **TSV with header** — must contain a `DOI` column, optional `PMID` column: ```tsv ID Title DOI PMID Year 1 Some paper 10.1007/s00330-010-1783-x 20628747 2010 ``` When a PMID is available, the PMC lookup is more reliable (PMID → PMCID conversion). ## PMC Download (JS-Challenge Resistant) PMC web pages may block automated downloads with JavaScript proof-of-work challenges. This tool uses three fallback methods: ### Method A: Europe PMC REST API (most reliable) ```bash PMCID="PMC9733600" curl -sLo output.pdf \ "https://europepmc.org/backend/ptpmcrender.fcgi?accid=${PMCID}&blobtype=pdf" ``` ### Method B: PMC OA FTP Service ```bash curl -s "https://www.ncbi.nlm.nih.gov/pmc/utils/oa/oa.fcgi?id=${PMCID}" | \ grep -oE 'hr