Pranjay-kumar

universal-data-acquisition-pipeline

Trigger when the user wants to collect, structure, evaluate, crawl, extract, refresh, or build reusable data acquisition pipelines from websites, APIs, portals, files, or rendered apps. Use for dataset design, source classification, feasibility, endpoint discovery, authorized/owned-session scraping plans, Patchright warm-session cookie generation, Playwright fallback, source probing, pagination analysis, scraper/pipeline architecture, sample validation, refresh design, and output contracts. Do not trigger for ordinary browsing, exploitative access, credential theft, CAPTCHA solving, auth bypass, rate-limit bypass, or non-data tasks.

Web & Frontend Listed

data-acquisition-browser

Use for Patchright/Playwright-based public or authorized browser probing: warm-session cookie/storage generation, browser network capture, JSON/API route discovery from page loads, rendered DOM fallback, screenshots, tiny DOM samples, and user-owned storage-state workflows. Do not use for CAPTCHA solving, credential extraction, auth bypass, or rate-limit bypass.

data-acquisition-feasibility

Use when the user wants to know whether a dataset/source is worth pursuing, compare routes, score feasibility, identify trapdoors, classify Green/Yellow/Red, or decide whether to stop, sample, narrow, license, use owned-session access, or build a pipeline.

data-acquisition-core

Shared core for the data acquisition skill tree. Use when a data acquisition task needs source access classification, output contracts, compliance boundaries, feasibility scorecards, probing standards, pipeline quality standards, or shared references used by sibling data-acquisition skills. Do not use alone for ordinary browsing or non-data tasks.

data-acquisition-discovery

Use for discovering and reverse-engineering data sources: official APIs, XHR/fetch, GraphQL, persisted queries, Algolia, Shopify, Salesforce Commerce Cloud, sitemaps, feeds, embedded JSON, hydration state, page-data routes, pagination limits, headers, params, and endpoint templates.

data-acquisition-pipeline

Use when the user wants a production-grade scraping/API/browser pipeline design or implementation plan: pipeline.yaml, schemas, raw/staged/normalized outputs, dedupe, incremental refresh, checkpoints, retries, rate-limit strategy, quality gates, observability, run reports, and recovery.

data-acquisition-publish

Use when packaging real data acquisition results for publication: probe-backed case studies, README summaries, evidence tables, sample rows, feasibility reports, and publishability checks. Do not publish hypothetical case studies, owned-session outputs, cookies, credentials, private data, or non-public authorized results.

Web & Frontend Listed

data-acquisition-design

Use when the user needs to decide what data to collect before scraping or API work: DatasetNeed, DatasetSpec, entity grain, required vs nice-to-have fields, freshness, history, coverage targets, join keys, exclusions, and uselessness criteria. Use for vague business goals, all data requests, and scope control before source discovery.