harvest-deep-crawl
SolidMulti-page deep crawling - documentation sites, wikis, knowledge bases
Install
Quality Score: 89/100
Skill Content
Details
- Author
- vibeeval
- Repository
- vibeeval/vibecosystem
- Created
- 2 months ago
- Last Updated
- 1 months ago
- Language
- C#
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
harvest-single
Single page smart extraction - articles, docs, blog posts to clean markdown
crawl
Use when the user wants to crawl an entire website, documentation site, or multiple pages from a domain; index a whole docs section; or follow links deeply across a site. Triggers on "crawl this site", "index the whole docs", "crawl all pages under", "spider this URL", "index the entire", "grab all pages from". Prefer over scrape when breadth matters — multiple pages across a site.
doc-harvester
Mirror an external platform's documentation site into the current project as local markdown files, so the coding agent can cross-check the official docs offline while building. Use this whenever a documentation or wiki URL appears together with a request to study, learn, fetch, download, mirror, save, scrape, or "vendor" the docs — including phrasings like "изучи документацию", "скачай документацию по ссылке", "собери вики", "study these docs", "pull the API docs into the repo", "I'm integrating with X, get its docs". Use it even if the user just pastes a docs link and says "learn this" without naming a tool, and even if they only give a deep link to one page. Prefer this skill over ad-hoc WebFetch whenever the goal is to capture a whole documentation set rather than read a single page.
site-crawlability
When the user wants to improve crawlability, fix orphan pages, or optimize site structure for search engines. Also use when the user mentions "crawlability," "crawl budget," "orphan pages," "internal links," "site structure," "site crawlability," "infinite scroll," "pagination," "masonry SEO," "AI crawler optimization," "GPTBot crawlability," "ClaudeBot crawlability," or "content not indexed." For internal links, use internal-links.
crawl4ai
This skill should be used when users need to scrape websites, extract structured data, handle JavaScript-heavy pages, crawl multiple URLs, or build automated web data pipelines. Includes optimized extraction patterns with schema generation for efficient, LLM-free extraction.