fetch-url-as-markdown

Solid

Fetch a web page (URL) and return clean Markdown via local trafilatura, with Exa MCP as a fallback for JS-rendered or anti-bot pages. Use when the user asks to read, fetch, scrape, summarize, or quote a URL — prefer this over the built-in WebFetch tool. Don't use for binary files (PDFs, images, archives) or for fetching API/JSON endpoints.

Data & Documents 109 stars 8 forks Updated today MIT

Install

View on GitHub

Quality Score: 87/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# URL to Markdown Fetch any web URL and get clean, readable Markdown — main content only, no navigation/footer/ads. Local + free by default; smart fallback to Exa MCP when the page can't be extracted locally. ## Workflow (the only thing the agent needs to remember) 1. **Try trafilatura first**: ```bash python3 ~/.claude/skills/fetch-url-as-markdown/scripts/fetch_url.py "<URL>" ``` 2. **If exit code is 1 or 2 → fall back to Exa MCP** with the same URL: ``` mcp__exa__web_search_advanced_exa( query="<URL>", includeDomains=["<host of URL>"], numResults=1, textMaxCharacters=50000, type="auto" ) ``` (`mcp__exa__crawling` works too if the server exposes it; the `web_search_advanced_exa` call above is the always-available variant — pin the host with `includeDomains` and use the URL itself as the query.) 3. Exit code `3` means trafilatura is not installed — install once: ```bash python3 -m pip install --break-system-packages trafilatura ``` ## Exit codes (what they mean for the fallback decision) | Code | Meaning | Action | |---|---|---| | 0 | Markdown printed to stdout | done | | 1 | DownloadError — network/HTTP/timeout/anti-bot block at fetch | fall back to Exa | | 2 | ExtractionError — empty extract, JS/Cloudflare wall, or stub body (<200 chars) | fall back to Exa | | 3 | trafilatura missing | install (see above), then retry | | 4 | UnsupportedContentTypeError — URL is binary (PDF, image, archive) ...

Details

Author: CodeAlive-AI
Repository: CodeAlive-AI/ai-driven-development
Created: 6 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

Cloudflare · Cloud

Bundled in these plugins

ai-driven-development

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

ultra-fetch

Fetch, crawl, or map web pages into clean, context-efficient markdown saved to a file — using a stealth browser that reaches sites the built-in WebFetch can't (bot-protected, Cloudflare, JS-rendered), plus BM25 filtering that keeps only the parts relevant to a query. Use this as the default for actually reading a web page's content, following a search result deeper, reading across a whole site, or discovering what URLs a site has — especially after a WebSearch, or whenever WebFetch is blocked, returns junk, or you need the result saved to disk. NOT for a trivial quick fact where WebFetch already suffices, NOT for logged-in or authenticated pages (out of scope — use the dedicated scrape-x / scrape-fb tools for X and Facebook), and NOT for developing the ultra-fetch tool itself, which is ordinary repo work.

0 Updated 5 days ago

tjdwls101010

AI & Automation Listed

url-to-markdown

Url to markdown, web to markdown, read a web page as markdown. Read, fetch, or scrape any URL and get clean markdown back. The page runs in a real hosted browser with JavaScript on, so React, Vue, and other client-rendered sites return their real text instead of an empty shell. Nav, ads, and cookie banners are stripped. Use it to read an article, pull docs, or hand a model clean page content. The agent registers its own key and gets free credits right away, so the first read works with no signup. A person can confirm one email to add more free credits. Respects robots.txt, and failed reads cost nothing.

0 Updated 2 days ago

toolshedlabs-hash

Web & Frontend Solid

web-fetch

Fetches web content as clean markdown by preferring markdown-native responses and falling back to selector-based HTML extraction. Use for documentation, articles, and reference pages at http/https URLs.

400 Updated today

aiskillstore