← ClaudeAtlas

crawlee-skilllisted

A web scraping and browser automation library for Node.js to build reliable crawlers. In JavaScript and TypeScript. Extract data for AI, LLMs, RAG, or GPTs. Download HTML, PDF, JPG, PNG, and other files from websites. Works with Puppeteer, Playwright, Cheerio, JSDOM, and raw HTTP. Both headful and headless mode. With proxy rotation.
gdm257/cc-plugins · ★ 0 · AI & Automation · score 66
Install: claude install-skill gdm257/cc-plugins
# Crawlee OpenCode Skill Crawlee is a scalable web crawling and scraping library for Node.js and TypeScript. It helps you build reliable crawlers that appear human-like and fly under the radar of modern bot protections even with default configuration. ## Quick Start ### Prerequisites - Node.js 16 or higher ### With Crawlee CLI (Recommended) ```bash npx crawlee create my-crawler cd my-crawler npm start ``` ### Manual Installation ```bash npm install crawlee playwright ``` ```typescript import { PlaywrightCrawler, Dataset } from 'crawlee'; const crawler = new PlaywrightCrawler({ async requestHandler({ request, page, enqueueLinks, log }) { const title = await page.title(); log.info(`Title of ${request.loadedUrl} is '${title}'`); await Dataset.pushData({ title, url: request.loadedUrl }); await enqueueLinks(); }, }); await crawler.run(['https://crawlee.dev']); ``` ## Overview Crawlee covers your crawling and scraping end-to-end and provides tools to: - Crawl the web for links - Scrape data from websites - Store extracted data to disk or cloud - Configure behavior to suit your project's needs ### Key Features - Single interface for HTTP and headless browser crawling - Persistent queue for URLs to crawl (breadth & depth first) - Pluggable storage of both tabular data and files - Automatic scaling with available system resources - Integrated proxy rotation and session management - Lifecycles customizable with hooks - CLI to bootstra