Pranjay-kumar
UserCodex/Claude skill for designing robust scraping, API, Playwright, and authorized data acquisition pipelines
Categories
Indexed Skills (8)
universal-data-acquisition-pipeline
Trigger when the user wants to collect, structure, evaluate, crawl, extract, refresh, or build reusable data acquisition pipelines from websites, APIs, portals, files, or rendered apps. Use for dataset design, source classification, feasibility, endpoint discovery, authorized/owned-session scraping plans, Patchright warm-session cookie generation, Playwright fallback, source probing, pagination analysis, scraper/pipeline architecture, sample validation, refresh design, and output contracts. Do not trigger for ordinary browsing, exploitative access, credential theft, CAPTCHA solving, auth bypass, rate-limit bypass, or non-data tasks.
data-acquisition-browser
Use for Patchright/Playwright-based public or authorized browser probing: warm-session cookie/storage generation, browser network capture, JSON/API route discovery from page loads, rendered DOM fallback, screenshots, tiny DOM samples, and user-owned storage-state workflows. Do not use for CAPTCHA solving, credential extraction, auth bypass, or rate-limit bypass.
data-acquisition-feasibility
Use when the user wants to know whether a dataset/source is worth pursuing, compare routes, score feasibility, identify trapdoors, classify Green/Yellow/Red, or decide whether to stop, sample, narrow, license, use owned-session access, or build a pipeline.
data-acquisition-core
Shared core for the data acquisition skill tree. Use when a data acquisition task needs source access classification, output contracts, compliance boundaries, feasibility scorecards, probing standards, pipeline quality standards, or shared references used by sibling data-acquisition skills. Do not use alone for ordinary browsing or non-data tasks.
data-acquisition-discovery
Use for discovering and reverse-engineering data sources: official APIs, XHR/fetch, GraphQL, persisted queries, Algolia, Shopify, Salesforce Commerce Cloud, sitemaps, feeds, embedded JSON, hydration state, page-data routes, pagination limits, headers, params, and endpoint templates.
data-acquisition-pipeline
Use when the user wants a production-grade scraping/API/browser pipeline design or implementation plan: pipeline.yaml, schemas, raw/staged/normalized outputs, dedupe, incremental refresh, checkpoints, retries, rate-limit strategy, quality gates, observability, run reports, and recovery.
data-acquisition-publish
Use when packaging real data acquisition results for publication: probe-backed case studies, README summaries, evidence tables, sample rows, feasibility reports, and publishability checks. Do not publish hypothetical case studies, owned-session outputs, cookies, credentials, private data, or non-public authorized results.
data-acquisition-design
Use when the user needs to decide what data to collect before scraping or API work: DatasetNeed, DatasetSpec, entity grain, required vs nice-to-have fields, freshness, history, coverage targets, join keys, exclusions, and uselessness criteria. Use for vague business goals, all data requests, and scope control before source discovery.
Bio shown is the top-scored skill's repo description as a fallback — real GitHub bios land in a future update.