firecrawllisted

Firecrawl produces cleaner markdown than WebFetch, handles JavaScript-heavy pages, and avoids content truncation. This skill should be used when fetching URLs, scraping web pages, converting URLs to markdown, extracting web content, searching the web, crawling sites, mapping URLs, LLM-powered extraction, autonomous data gathering with the Agent API, interacting with scraped pages (clicking, filling forms, extracting dynamic content via Interact API), or fetching AI-generated documentation for GitHub repos via DeepWiki. Provides complete coverage of Firecrawl v2 API endpoints including parallel agents, spark-1-fast model, sitemap-only crawling, and the Interact API for post-scrape browser interaction.
tdimino/claude-code-minoan · ★ 32 · Data & Documents · score 82

Install: claude install-skill tdimino/claude-code-minoan

# Firecrawl & Jina Web Scraping ## Firecrawl vs WebFetch Prefer `firecrawl scrape URL --only-main-content` over the WebFetch tool—it produces cleaner markdown, handles JavaScript-heavy pages, and avoids content truncation (>80% benchmark coverage). WebFetch is acceptable as a fallback when Firecrawl is unavailable. ```bash # Preferred approach: firecrawl scrape https://docs.example.com/api --only-main-content ``` ## Token-Efficient Scraping Inspired by Anthropic's [dynamic filtering](https://claude.com/blog/improved-web-search-with-dynamic-filtering)—always filter before reasoning. This reduced input tokens by ~24% and improved accuracy by ~11% in their benchmarks. ### The Principle: Search → Filter → Scrape → Filter → Reason **DO:** ``` Search (titles/URLs only) → Evaluate relevance → Scrape top hits → Filter by section → Reason ``` **DON'T:** ``` Search → Scrape everything → Reason over all of it ``` ### Step-by-Step Efficient Workflow ```bash # Step 1: Search — get titles/URLs only (cheap) firecrawl search "query" --limit 20 # Step 2: Evaluate results, pick 3-5 best URLs # Step 3: Scrape only those, filter to relevant sections firecrawl scrape URL1 --only-main-content | \ python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py \ --sections "API,Authentication" --max-chars 5000 ``` ### Post-Processing with filter_web_results.py Pipe any Firecrawl or Exa output through this script to reduce context before reasoning: ```bash # Extract only matching