firecrawllisted
Install: claude install-skill tdimino/claude-code-minoan
# Firecrawl & Jina Web Scraping
## Firecrawl vs WebFetch
Prefer `firecrawl scrape URL --only-main-content` over the WebFetch tool—it produces cleaner markdown, handles JavaScript-heavy pages, and avoids content truncation (>80% benchmark coverage). WebFetch is acceptable as a fallback when Firecrawl is unavailable.
```bash
# Preferred approach:
firecrawl scrape https://docs.example.com/api --only-main-content
```
## Token-Efficient Scraping
Inspired by Anthropic's [dynamic filtering](https://claude.com/blog/improved-web-search-with-dynamic-filtering)—always filter before reasoning. This reduced input tokens by ~24% and improved accuracy by ~11% in their benchmarks.
### The Principle: Search → Filter → Scrape → Filter → Reason
**DO:**
```
Search (titles/URLs only) → Evaluate relevance → Scrape top hits → Filter by section → Reason
```
**DON'T:**
```
Search → Scrape everything → Reason over all of it
```
### Step-by-Step Efficient Workflow
```bash
# Step 1: Search — get titles/URLs only (cheap)
firecrawl search "query" --limit 20
# Step 2: Evaluate results, pick 3-5 best URLs
# Step 3: Scrape only those, filter to relevant sections
firecrawl scrape URL1 --only-main-content | \
python3 ~/.claude/skills/firecrawl/scripts/filter_web_results.py \
--sections "API,Authentication" --max-chars 5000
```
### Post-Processing with filter_web_results.py
Pipe any Firecrawl or Exa output through this script to reduce context before reasoning:
```bash
# Extract only matching