doc-harvesterlisted

Mirror an external platform's documentation site into the current project as local markdown files, so the coding agent can cross-check the official docs offline while building. Use this whenever a documentation or wiki URL appears together with a request to study, learn, fetch, download, mirror, save, scrape, or "vendor" the docs — including phrasings like "изучи документацию", "скачай документацию по ссылке", "собери вики", "study these docs", "pull the API docs into the repo", "I'm integrating with X, get its docs". Use it even if the user just pastes a docs link and says "learn this" without naming a tool, and even if they only give a deep link to one page. Prefer this skill over ad-hoc WebFetch whenever the goal is to capture a whole documentation set rather than read a single page.
arturayupov/awesome-claude-skills · ★ 0 · AI & Automation · score 70

Install: claude install-skill arturayupov/awesome-claude-skills

# doc-harvester Turn any documentation site into a folder of local markdown inside the user's project, so that during development the coding agent can grep and read the official docs instead of guessing or fetching pages one at a time. A plain single-URL fetch fails on real doc sites for three reasons, and this skill is built to defeat all three: (1) one fetch only sees one page; (2) most modern docs are client-rendered single-page apps, so a raw fetch returns an empty HTML shell with no content; (3) content hidden behind tabs and nested sections is never discovered. The engine works around this with a ladder of strategies, cheapest and most reliable first. ## Quickstart The engine is the bundled script `scripts/harvest.py` (Python 3, standard library only — nothing to install for the common cases). Run it from the project root: ```bash python3 <skill_dir>/scripts/harvest.py "<DOC_URL>" --out docs/vendor ``` `<skill_dir>` is the directory containing this SKILL.md. `<DOC_URL>` can be any page of the target docs — the engine figures out the root itself. Output lands in `docs/vendor/<platform>/`. The engine auto-selects a method and prints a final `RESULT:` line. Read it: it reports which method succeeded, how many pages were saved, and the output directory. Then read `_index.md` and skim `llms-full.md` in the output folder to confirm the content is real documentation and not an error page. ## How the engine chooses a method The engine tries each step below and stops at