fetch-url-as-markdownlisted
Install: claude install-skill CodeAlive-AI/ai-driven-development
# URL to Markdown
Fetch any web URL and get clean, readable Markdown — main content only, no
navigation/footer/ads. Local + free by default; smart fallback to Exa MCP
when the page can't be extracted locally.
## Workflow (the only thing the agent needs to remember)
1. **Try trafilatura first**:
```bash
python3 ~/.claude/skills/fetch-url-as-markdown/scripts/fetch_url.py "<URL>"
```
2. **If exit code is 1 or 2 → fall back to Exa MCP** with the same URL:
```
mcp__exa__web_search_advanced_exa(
query="<URL>",
includeDomains=["<host of URL>"],
numResults=1,
textMaxCharacters=50000,
type="auto"
)
```
(`mcp__exa__crawling` works too if the server exposes it; the `web_search_advanced_exa`
call above is the always-available variant — pin the host with `includeDomains` and
use the URL itself as the query.)
3. Exit code `3` means trafilatura is not installed — install once:
```bash
python3 -m pip install --break-system-packages trafilatura
```
## Exit codes (what they mean for the fallback decision)
| Code | Meaning | Action |
|---|---|---|
| 0 | Markdown printed to stdout | done |
| 1 | DownloadError — network/HTTP/timeout/anti-bot block at fetch | fall back to Exa |
| 2 | ExtractionError — empty extract, JS/Cloudflare wall, or stub body (<200 chars) | fall back to Exa |
| 3 | trafilatura missing | install (see above), then retry |
| 4 | UnsupportedContentTypeError — URL is binary (PDF, image, archive)