← ClaudeAtlas

web-scrapelisted

Intelligent web scraper with content extraction, multiple output formats, and error handling
aiskillstore/marketplace · ★ 329 · Web & Frontend · score 82
Install: claude install-skill aiskillstore/marketplace
# Web Scraping Skill v3.0 ## Usage ``` /web-scrape <url> [options] ``` **Options:** - `--format=markdown|json|text` - Output format (default: markdown) - `--full` - Include full page content (skip smart extraction) - `--screenshot` - Also save a screenshot - `--scroll` - Scroll to load dynamic content (infinite scroll pages) **Examples:** ``` /web-scrape https://example.com/article /web-scrape https://news.site.com/story --format=json /web-scrape https://spa-app.com/page --scroll --screenshot ``` --- ## Execution Flow ### Phase 1: Navigate and Load ``` 1. mcp__playwright__browser_navigate url: "<target URL>" 2. mcp__playwright__browser_wait_for time: 2 (allow initial render) ``` **If `--scroll` option:** Execute scroll sequence to trigger lazy loading: ``` 3. mcp__playwright__browser_evaluate function: "async () => { for (let i = 0; i < 3; i++) { window.scrollTo(0, document.body.scrollHeight); await new Promise(r => setTimeout(r, 1000)); } window.scrollTo(0, 0); }" ``` ### Phase 2: Capture Content ``` 4. mcp__playwright__browser_snapshot → Returns full accessibility tree with all text content ``` **If `--screenshot` option:** ``` 5. mcp__playwright__browser_take_screenshot filename: "scraped_<domain>_<timestamp>.png" fullPage: true ``` ### Phase 3: Close Browser ``` 6. mcp__playwright__browser_close ``` --- ## Smart Content Extraction After getting the snapshot, apply intelligent extraction: ### Step 1: Ide