webreaper
SolidScrape, crawl, or extract structured data from one or more URLs via the `webreaper` CLI. Outputs clean Markdown by default; JSON when a schema is given. Maps a site's URLs in one call. Handles JS-rendered pages and bot-protected sites (Cloudflare, DataDome, PerimeterX) via auto-escalating stealth. Use this skill whenever the user asks to: - scrape, crawl, or extract from a URL or site - get clean Markdown of a webpage (for further processing, not a summary) - pull specific fields from one or many pages - enumerate / discover URLs on a site - read a JS-rendered single-page app - scrape a site that's blocking direct requests Trigger phrases include: "scrape <site>", "crawl <site>", "extract <data> from <url>", "what's on <site>", "what pages does <site> have", "give me the markdown of <url>", "convert <url> to markdown", "pull <field> from <url>", "save <article> as markdown", "build a scraper for <site>", "read <url> into context", "this site is blocking me", "Cloudflare-protected site". Prefer this over the b
Install
Quality Score: 87/100
Skill Content
Details
- Author
- pavlovtech
- Repository
- pavlovtech/WebReaper
- Created
- 4 years ago
- Last Updated
- today
- Language
- C#
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
web-scrape
Intelligent web scraper with content extraction, multiple output formats, and error handling
scrapling
Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python.
firecrawl
Firecrawl produces cleaner markdown than WebFetch, handles JavaScript-heavy pages, and avoids content truncation. This skill should be used when fetching URLs, scraping web pages, converting URLs to markdown, extracting web content, searching the web, crawling sites, mapping URLs, LLM-powered extraction, autonomous data gathering with the Agent API, interacting with scraped pages (clicking, filling forms, extracting dynamic content via Interact API), or fetching AI-generated documentation for GitHub repos via DeepWiki. Provides complete coverage of Firecrawl v2 API endpoints including parallel agents, spark-1-fast model, sitemap-only crawling, and the Interact API for post-scrape browser interaction.
enact-firecrawl
Scrape, crawl, search, and extract structured data from websites using Firecrawl API - converts web pages to LLM-ready markdown
harvest-single
Single page smart extraction - articles, docs, blog posts to clean markdown