data-scraper-agent

Solid

Build a fully automated AI-powered data collection agent for any public source — job boards, prices, news, GitHub, sports, anything. Scrapes on a schedule, enriches data with a free LLM (Gemini Flash), stores results in Notion/Sheets/Supabase, and learns from user feedback. Runs 100% free on GitHub Actions. Use when the user wants to monitor, collect, or track any public data automatically.

AI & Automation 196,640 stars 30253 forks Updated 2 days ago MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Data Scraper Agent Build a production-ready, AI-powered data collection agent for any public data source. Runs on a schedule, enriches results with a free LLM, stores to a database, and improves over time. **Stack: Python · Gemini Flash (free) · GitHub Actions (free) · Notion / Sheets / Supabase** ## When to Activate - User wants to scrape or monitor any public website or API - User says "build a bot that checks...", "monitor X for me", "collect data from..." - User wants to track jobs, prices, news, repos, sports scores, events, listings - User asks how to automate data collection without paying for hosting - User wants an agent that gets smarter over time based on their decisions ## Core Concepts ### The Three Layers Every data scraper agent has three layers: ``` COLLECT → ENRICH → STORE │ │ │ Scraper AI (LLM) Database runs on scores/ Notion / schedule summarises Sheets / & classifies Supabase ``` ### Free Stack | Layer | Tool | Why | |---|---|---| | **Scraping** | `requests` + `BeautifulSoup` | No cost, covers 80% of public sites | | **JS-rendered sites** | `playwright` (free) | When HTML scraping fails | | **AI enrichment** | Gemini Flash via REST API | 500 req/day, 1M tokens/day — free | | **Storage** | Notion API | Free tier, great UI for review | | **Schedule** | GitHub Actions cron | Free for public repos | | **Learning** | JSON feedback file in repo | Zero infra, persists in git | ### AI Model Fallback Chain Buil...

Details

Author
affaan-m
Repository
affaan-m/everything-claude-code
Created
4 months ago
Last Updated
2 days ago
Language
JavaScript
License
MIT

Integrates with

Related Skills