python-pipeline
SolidPython data processing pipelines with modular architecture. Use when building content processing workflows, implementing dispatcher patterns, integrating Google Sheets/Drive APIs, or creating batch processing systems. Covers patterns from rosen-scraper, image-analyzer, and social-scraper projects.
Install
Quality Score: 89/100
Skill Content
Details
- Author
- jamditis
- Repository
- jamditis/claude-skills-journalism
- Created
- 5 months ago
- Last Updated
- today
- Language
- HTML
- License
- MIT
Similar Skills
Semantically similar based on skill content — not just same category
python-data-patterns
Pandas, Polars, and PySpark idioms for production data engineering — chunked reads, memory-safe transforms, vectorized operations, type optimization, and performance patterns. Use this skill whenever the user is writing a Python data transformation script and running into memory issues, slow performance, or correctness bugs with large datasets. Also trigger when the user asks how to handle large CSV/Parquet files, process data in batches, use Polars instead of Pandas, optimize a PySpark job, or reduce DataFrame memory usage. If you see someone iterating row-by-row over a DataFrame, this skill should trigger immediately.
bigdata-processing
Core big data processing toolkit for data teams. Includes Polars, Dask, Vaex for large-scale data processing, ETL pipelines, and distributed computing. Use when working with datasets larger than memory, building data pipelines, or optimizing data processing performance.
pipeline-architect
Designs and implements data pipelines: ETL/ELT, streaming, batch processing, schema migrations, and data warehouse architecture. Covers Kafka, Airflow, dbt, Spark, ClickHouse, BigQuery, Snowflake, Redis Streams, and more. Use this skill when the user asks about data pipelines, ETL jobs, data transformation, streaming setup, data warehouse design, CDC, schema migrations, data quality checks, or anything involving moving data from source to target. Also triggers on "build a pipeline," "migrate data from X to Y," "set up streaming," "design my data warehouse," or "data quality is bad, help me fix it."
transforming-data
Transform raw data into analytical assets using ETL/ELT patterns, SQL (dbt), Python (pandas/polars/PySpark), and orchestration (Airflow). Use when building data pipelines, implementing incremental models, migrating from pandas to polars, or orchestrating multi-step transformations with testing and quality checks.
pipeline-design
Design ETL/ELT pipelines end-to-end — source connectors, extraction strategies, transform logic, load patterns, idempotency, scheduling, and error handling. Use this skill whenever the user is starting a new ingestion job, planning how data moves from a source (REST API, database, file, webhook, message queue) into a data warehouse or data lake. Also trigger when the user asks about pipeline architecture, incremental vs. full loads, backfill strategies, CDC, retry logic, or orchestration choices (Airflow, Prefect, dbt). This skill should feel like pairing with a senior data engineer on day one of a new pipeline project.