← ClaudeAtlas

crawl4ailisted

Use when scraping JavaScript-heavy pages or SPAs, crawling multiple URLs concurrently, extracting structured data with reusable CSS/JSON schemas, or building automated web data pipelines. Wraps the Crawl4AI library (`crwl` CLI and Python SDK) with schema-generation patterns for LLM-free extraction. Triggers on crawl4ai, crwl, scrape JS-heavy site, scrape SPA, headless browser scrape, schema-based extraction, batch crawl, sitemap crawl, web data pipeline. SKIP when a static HTML page can be read with `defuddle` / `fetch-web` — those are faster cold-start and don't need a browser.
brettdavies/crawl4ai-skill · ★ 32 · AI & Automation · score 75
Install: claude install-skill brettdavies/crawl4ai-skill
# Crawl4AI **Verified against `crawl4ai`** [`VERSION`](VERSION). PEP 723 pins in `scripts/*.py` and `tests/*.py` floor at that version. ## Overview Crawl4AI wraps a headless browser (Playwright) plus a markdown-aware content pipeline. Use it when defuddle/curl can't reach the content — JavaScript-rendered pages, login-gated content, infinite scroll, multi-URL concurrency, repeatable schema-based extraction. This skill exposes both interfaces of the underlying library: - **CLI** (`crwl`) — quick, scriptable commands: [CLI Guide](references/cli-guide.md) - **Python SDK** — full programmatic control: [SDK Guide](references/sdk-guide.md) ## Invoked with a URL argument When the user runs `/crawl4ai <url>` with a single URL and no further qualifier, treat it as the JS-heavy fetch case and default to: ```bash crwl <url> -c "wait_until=networkidle,page_timeout=60000" -o markdown ``` `wait_until=networkidle` waits for the network to be quiet for ~500ms post-load — the right default when the user hasn't named a specific element on a JS-rendered page. (Avoid `wait_for=css:body`: `<body>` exists at t=0 on every HTML response, so it's satisfied before JS renders content.) Then return the markdown to the agent context. Adjust to `wait_for=css:<selector>` if the user named a specific element. Skip the default and route to the relevant section below for any task that names extraction, batch / multi-URL, login / session, screenshot / PDF, or URL discovery — those each have their own