scraperapi-crawlerlisted
Install: claude install-skill scraperapi/scraperapi-skills
# ScraperAPI Crawler
The Crawler discovers and scrapes linked pages automatically, starting from a seed URL and
following links that match your regex pattern. Use it when you need data from many pages of a
site but don't have the URLs upfront.
## When NOT to use the Crawler
- **You have a known URL list** → use the [Async API](https://docs.scraperapi.com/making-async-requests) instead; it's cheaper and more predictable.
- **You need a single page** → use the Standard API (`api.scraperapi.com`).
- **The content is behind login** → the Crawler only accesses publicly available pages.
- **Free plan, need depth > 1** → free accounts are capped at `max_depth: 1`.
## Endpoints
| Action | Method | URL |
|--------|--------|-----|
| Start a crawl | POST | `https://crawler.scraperapi.com/job` |
| Check status | GET | `https://crawler.scraperapi.com/job/<jobId>` |
| Cancel a crawl | DELETE | `https://crawler.scraperapi.com/job/<jobId>` |
Auth: include `api_key` in the JSON body (POST) or as a query parameter (GET/DELETE).
## Starting a Crawl
```python
import os, requests
API_KEY = os.environ["SCRAPERAPI_API_KEY"]
job = requests.post(
"https://crawler.scraperapi.com/job",
json={
"api_key": API_KEY,
"start_url": "https://example.com/blog/",
"url_regexp_include": "(?<full_url>https?://example\\.com/blog/.*)",
"url_regexp_exclude": ".*\\.(pdf|png|jpg|zip)",
"max_depth": 3,
"crawl_budget": 500