scraper-contractlisted
Install: claude install-skill robdasi/skills
# Scraper Contract
A scraper rarely fails by crashing. It fails by returning nothing, or half a record, while the pipeline downstream carries on as if the data were real. A week later you're emailing "Hi {first_name}" because one field came back null and nobody decided that was a failure.
This skill writes the contract that stops that. You describe what you're scraping and from where, it produces a contract that classifies every failure, treats empty as broken, and says which fields you're allowed to trust. It's the wrapper I put around every scraper before it goes near a pipeline.
Build the contract, then stop.
## Inputs (ask for whatever is missing)
- **The target** (required): the site/source and the fields you need off it.
- **How you're fetching**: Apify / Playwright / a fetch-and-parse / an API, and whether the source is one you control.
- *Optional:* a real example of a good record, and the worst page you've hit (paywalled, JS-rendered, captcha'd).
## The method
1. **Define the failure taxonomy.** A closed enum of failure codes, each mapped from what you can observe. The set that has covered every real case for me:
- **unreachable** — DNS/connection failure, 404/502/503/504, ENOTFOUND, ECONNRESET.
- **blocked** — 403, captcha, "access denied", a bot wall.
- **timeout** — the request or render exceeded its budget.
- **rate-limited** — 429, or the source's throttle response.
- **invalid-input** — the URL/target was malformed before you even left.