scrapling

Featured

使用 scrapling 进行网页抓取和数据提取。自动选择 Fetcher，支持 Cloudflare/WAF 绕过、Session 登录、HTML 解析。当用户提到 scrape/crawl/fetch page/extract data/爬取/抓取/绕过Cloudflare/解析HTML/批量采集时触发。

AI & Automation 5,403 stars 413 forks Updated 2 days ago MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Scrapling 网页抓取 Skill ## 步骤 0：检查版本 ```bash pip show scrapling ``` - 未安装 → 执行 `pip install "scrapling[fetchers]"` + `scrapling install` - 有新版 → 执行 `pip install --upgrade "scrapling[fetchers]"` → 查 changelog 告知用户 - 已最新 → 继续 ## 步骤 1：选择 Fetcher ``` 目标网站 → │ ├─ 已有 HTML 字符串/文件，只需解析? │ → Selector（纯解析，无网络请求） │ → 模板: templates/parse_only.py │ ├─ 静态页面，无 JS 渲染，无反爬? │ → Fetcher（最快，基于 curl_cffi） │ → 模板: templates/basic_fetch.py │ ├─ 需要登录（HTTP 表单，非 JS 登录）? │ → FetcherSession（保持会话 cookie） │ → 模板: templates/session_login.py │ ├─ 有 Cloudflare / WAF 保护? │ → StealthyFetcher（Camoufox 浏览器，自动过 CF） │ → 模板: templates/stealth_cloudflare.py │ ├─ SPA 应用（React/Vue），需要 JS 渲染? │ → DynamicFetcher（Playwright 浏览器） │ → 基于模板即时生成 │ └─ 不确定? → 先用 Fetcher 试，403/空内容 → 升级到 StealthyFetcher ``` ## 步骤 2：执行工作流 ``` 1. 检查版本（步骤 0） 2. 查阅 references/site-patterns.md — 匹配已有模式则直接复用 3. 无匹配 → 用决策树选择 Fetcher 4. 读取对应模板 → 替换参数 → 生成完整脚本 5. 执行脚本 → 返回结果 6. **沉淀经验（必做）**: - 新站点 → 追加到 site-patterns.md - 新 cookie / 用户提供了 cookie → 保存到 cookie-vault.md - **完成抓取后必须检查**：是否有新的 cookie 或 site pattern 需要保存 ``` ## Cookie 格式速查 | Fetcher 类型 | Cookie 格式 | 示例 | |-------------|-------------|------| | Fetcher / FetcherSession | `dict` | `{'name': 'value', 'token': 'abc'}` | | StealthyFetcher / DynamicFetcher | `list[dict]` | `[{'name': 'n', 'value': 'v', 'domain': '.site.com', 'path': '/'}]` | **浏览器 Fetcher cookie 必填字段**: `name`, `value`, `domain`, `path` ## 超时单位速查 | Fetcher 类型 | 超时单位 | 示例 | |-------------|--...

Details

Author: fengshao1227
Repository: fengshao1227/ccg-workflow
Created: 4 months ago
Last Updated: 2 days ago
Language: Go
License: MIT

Integrates with

Cloudflare · Cloud Playwright · Testing

Similar Skills

Semantically similar based on skill content — not just same category

Data & Documents Solid

scrapling

13 Updated 5 days ago

wzyxdwll

AI & Automation Solid

scrapling

Web scraping with Scrapling - HTTP fetching, stealth browser automation, Cloudflare bypass, and spider crawling via CLI and Python.

175,435 Updated today

NousResearch

Data & Documents Listed

ez-crawl

Cloudflare /crawl API 網站爬取工具。當使用者想要爬取、擷取、抓取一個網站的內容時，用 Cloudflare Browser Rendering 的 /crawl REST API 來完成，而不是用瀏覽器手動操作。觸發時機：使用者說「/ez」、「ez crawl」、「用 Cloudflare 爬」、「CF crawl」、「用 /crawl API」、「幫我爬這個網站」、「抓這個站的內容」、「crawl this site」、「把這個網站的內容都抓下來」、「爬完整站」、「抓整站 markdown」等。也��用於使用者提到想把某個文件站、部落格、產品頁批量轉成 markdown 或 JSON，或者需要建 RAG knowledge base、訓練資料集時想批量抓網頁內容。只要涉及「用 API 批量爬網站」的場景都應觸發，即使使用者沒有明確說 Cloudflare。不適用於單一頁面的簡單抓取（那用 WebFetch 就好）或需要登入互動的瀏覽器操作。

4 Updated 2 months ago

0xedgelessblade

AI & Automation Featured

firecrawl-performance-tuning

Optimize Firecrawl scraping performance with caching, batch scraping, and format selection. Use when experiencing slow scrapes, optimizing credit usage per page, or building high-throughput scraping pipelines. Trigger with phrases like "firecrawl performance", "optimize firecrawl", "firecrawl latency", "firecrawl caching", "firecrawl slow", "firecrawl batch".

2,274 Updated today

jeremylongshore

AI & Automation Listed

scrapling

Use Scrapling for web extraction (HTTP, async, dynamic, stealth fetchers). Prefer Scrapling for scraping pipelines; fallback to `playwright-ext` when blocked.

7 Updated today

codingSamss