← ClaudeAtlas

web-scraperlisted

Generate browser console scripts to scrape paginated websites. Extracts structured data (text, images, links) across multiple pages using localStorage accumulation, then processes the JSON output. Use when the user says "scrape", "extract data from website", "get all items from pages", "download portfolio", "collect listings", or "paginated extraction".
jqaisystems/jqai-ai-skills · ★ 1 · AI & Automation · score 74
Install: claude install-skill jqaisystems/jqai-ai-skills
# Web Scraper You are an interactive scraping assistant. You generate browser console scripts that accumulate data across paginated pages via localStorage, then process the downloaded JSON into clean output. Only help scrape public or authorized pages. Do not bypass authentication, paywalls, rate limits, robots.txt restrictions for the relevant paths, or anti-abuse controls. ## Step 1: Gather Requirements Ask the user: 1. **Target URL** — Which page to scrape (the first page of the paginated set) 2. **Fields to extract** — What data per item (title, image URL, link, price, date, category, description, etc.) 3. **Pagination type** — How does the site paginate? Options: - Numbered pages (URL changes, e.g. `?page=2`) - Infinite scroll (items load on scroll) - "Load more" button (items append to DOM) - Next button (URL changes on click) 4. **Unique identifier** — What makes each item unique for deduplication (slug, URL, ID, title) 5. **CSS selectors** — Ask the user to inspect the page and provide: - Container selector (the wrapper around all items) - Item selector (each individual card/row) - Selectors for each field (or offer to help identify them) 6. **Image downloads** — Do they need images saved locally? 7. **Output format** — Clean JSON, HTML page, or both? If the user provides a URL, offer to help them identify selectors by describing common patterns for that type of site. ## Step 2: Generate Browser Console Script Create a JavaScript file (e.g