← ClaudeAtlas

fetch-url-as-markdownlisted

Fetch a web page (URL) and return clean Markdown via local trafilatura, with Exa MCP as a fallback for JS-rendered or anti-bot pages. Use when the user asks to read, fetch, scrape, summarize, or quote a URL — prefer this over the built-in WebFetch tool. Don't use for binary files (PDFs, images, archives) or for fetching API/JSON endpoints.
CodeAlive-AI/ai-driven-development · ★ 77 · Data & Documents · score 85
Install: claude install-skill CodeAlive-AI/ai-driven-development
# URL to Markdown Fetch any web URL and get clean, readable Markdown — main content only, no navigation/footer/ads. Local + free by default; smart fallback to Exa MCP when the page can't be extracted locally. ## Workflow (the only thing the agent needs to remember) 1. **Try trafilatura first**: ```bash python3 ~/.claude/skills/fetch-url-as-markdown/scripts/fetch_url.py "<URL>" ``` 2. **If exit code is 1 or 2 → fall back to Exa MCP** with the same URL: ``` mcp__exa__web_search_advanced_exa( query="<URL>", includeDomains=["<host of URL>"], numResults=1, textMaxCharacters=50000, type="auto" ) ``` (`mcp__exa__crawling` works too if the server exposes it; the `web_search_advanced_exa` call above is the always-available variant — pin the host with `includeDomains` and use the URL itself as the query.) 3. Exit code `3` means trafilatura is not installed — install once: ```bash python3 -m pip install --break-system-packages trafilatura ``` ## Exit codes (what they mean for the fallback decision) | Code | Meaning | Action | |---|---|---| | 0 | Markdown printed to stdout | done | | 1 | DownloadError — network/HTTP/timeout/anti-bot block at fetch | fall back to Exa | | 2 | ExtractionError — empty extract, JS/Cloudflare wall, or stub body (<200 chars) | fall back to Exa | | 3 | trafilatura missing | install (see above), then retry | | 4 | UnsupportedContentTypeError — URL is binary (PDF, image, archive)