harvest-single

Solid

Single page smart extraction - articles, docs, blog posts to clean markdown

AI & Automation 494 stars 41 forks Updated 1 months ago MIT

Install

View on GitHub

Quality Score: 87/100

Stars 20%

Recency 20%

Frontmatter 20%

Documentation 15%

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Harvest Single Page Extract and clean content from a single web page. Auto-detects content type (article, documentation, API reference, blog post) and produces clean, structured markdown. ## Usage ``` /harvest <url> ``` ## Examples ```bash # Extract a blog post /harvest https://blog.example.com/best-practices-2024 # Extract API documentation page /harvest https://docs.stripe.com/api/charges # Extract a GitHub README /harvest https://github.com/owner/repo ``` ## How It Works 1. Fetch URL content via WebFetch or crawl4ai 2. Detect content type (article, docs, API ref, blog, wiki) 3. Extract main content, strip navigation/ads/footers 4. Preserve code blocks, tables, images 5. Add metadata header (source, date, word count) 6. Save to `.claude/cache/agents/harvest/` ## Output Format ```markdown # [Page Title] > Source: [URL] > Extracted: [timestamp] > Type: [article|docs|api|blog|wiki] > Words: [count] [Clean extracted content in markdown] ## Links Found - [Link text](URL) ``` ## Fallback Chain 1. crawl4ai Docker (port 11235) - preferred 2. WebFetch tool - built-in fallback 3. curl + html2text - last resort ## When to Use - Quick grab of a single page's content - Extracting a specific doc page for reference - Saving an article for later analysis - Getting clean markdown from messy HTML

Details

Author: vibeeval
Repository: vibeeval/vibecosystem
Created: 2 months ago
Last Updated: 1 months ago
Language: C#
License: MIT

Integrates with

Anthropic · AI Stripe · Payments

Related Skills

AI & Automation Featured

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

ck

Persistent per-project memory for Claude Code. Auto-loads project context on session start, tracks sessions with git activity, and writes to native memory. Commands run deterministic Node.js scripts — behavior is consistent across model versions.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

browser

Web browser automation with AI-optimized snapshots for claude-flow agents

55,973 Updated today

ruvnet