← ClaudeAtlas

seo-crawl-renderlisted

Fetch a page (or local files) and build the shared PageSnapshot every audit module reads — raw HTML, rendered DOM when a render MCP is available, response headers, status/redirect chain, and site artifacts (robots.txt, sitemaps, llms.txt). Decides whether JavaScript rendering is needed and records the data tier.
Hainrixz/claude-seo-ai · ★ 14 · AI & Automation · score 81
Install: claude install-skill Hainrixz/claude-seo-ai
# seo-crawl-render Produces ONE `PageSnapshot` consumed by all other skills. Building it once is what makes the offline (Tier 0) audit possible. ## PageSnapshot shape ``` { target: { kind: "url"|"path", value }, status_chain: [ {url, status, location?} ], // redirects, final status headers: { ... }, // final response headers (incl. X-Robots-Tag, content-type, hreflang Link) raw_html: "...", // pre-JS HTML the crawler/AI sees first rendered_dom: "..."|null, // post-JS DOM (null if no render available) render: { needed: bool, used: "webfetch"|"playwright"|"firecrawl"|"none", confidence: "high"|"reduced" }, artifacts: { robots_txt: "..."|null, sitemaps: [...], llms_txt: "..."|null }, tier: 0|1|2 } ``` ## Acquisition 1. **Local path**: read files directly (`Read`/`Glob`); treat built HTML as `raw_html`. For framework source (Next/Nuxt/etc.), note the framework and that rendered output may differ from source. 2. **URL**: fetch with `WebFetch` (HTTPS-upgrade; if it returns a cross-host redirect, re-fetch the target). Capture status chain and headers. Fetch `robots.txt`, referenced sitemap(s), and `/llms.txt`. ## Render decision - `--render static` → never render. `--render js` → always try to render. - `--render auto` (default): flag CSR when the raw HTML body is near-empty, has hydration markers (`__NEXT_DATA__`, `window.__NUXT__`, `data-reactroot`, a single root `<div id="app">`)