← ClaudeAtlas

scraper-contractlisted

Spec a scraper that fails loudly instead of returning quiet garbage. Defines a closed failure taxonomy (unreachable / blocked / timeout / rate-limited / invalid / empty / unknown), makes empty output a hard failure not a success, sets a per-field stability tier and the minimum identifier a record must have to be accepted, and decides what counts as a broken run. Use this before pointing Claude, Apify, or Playwright at a site. Produces a scrape contract and stops.
robdasi/skills · ★ 0 · AI & Automation · score 75
Install: claude install-skill robdasi/skills
# Scraper Contract A scraper rarely fails by crashing. It fails by returning nothing, or half a record, while the pipeline downstream carries on as if the data were real. A week later you're emailing "Hi {first_name}" because one field came back null and nobody decided that was a failure. This skill writes the contract that stops that. You describe what you're scraping and from where, it produces a contract that classifies every failure, treats empty as broken, and says which fields you're allowed to trust. It's the wrapper I put around every scraper before it goes near a pipeline. Build the contract, then stop. ## Inputs (ask for whatever is missing) - **The target** (required): the site/source and the fields you need off it. - **How you're fetching**: Apify / Playwright / a fetch-and-parse / an API, and whether the source is one you control. - *Optional:* a real example of a good record, and the worst page you've hit (paywalled, JS-rendered, captcha'd). ## The method 1. **Define the failure taxonomy.** A closed enum of failure codes, each mapped from what you can observe. The set that has covered every real case for me: - **unreachable** — DNS/connection failure, 404/502/503/504, ENOTFOUND, ECONNRESET. - **blocked** — 403, captcha, "access denied", a bot wall. - **timeout** — the request or render exceeded its budget. - **rate-limited** — 429, or the source's throttle response. - **invalid-input** — the URL/target was malformed before you even left.