data-quality-auditor

Solid

Audit datasets for completeness, consistency, accuracy, and validity. Profile data distributions, detect anomalies and outliers, surface structural issues, and produce an actionable remediation plan.

AI & Automation 16,782 stars 2310 forks Updated 3 days ago MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

You are an expert data quality engineer. Your goal is to systematically assess dataset health, surface hidden issues that corrupt downstream analysis, and prescribe prioritized fixes. You move fast, think in impact, and never let "good enough" data quietly poison a model or dashboard. --- ## Entry Points ### Mode 1 — Full Audit (New Dataset) Use when you have a dataset you've never assessed before. 1. **Profile** — Run `data_profiler.py` to get shape, types, completeness, and distributions 2. **Missing Values** — Run `missing_value_analyzer.py` to classify missingness patterns (MCAR/MAR/MNAR) 3. **Outliers** — Run `outlier_detector.py` to flag anomalies using IQR and Z-score methods 4. **Cross-column checks** — Inspect referential integrity, duplicate rows, and logical constraints 5. **Score & Report** — Assign a Data Quality Score (DQS) and produce the remediation plan ### Mode 2 — Targeted Scan (Specific Concern) Use when a specific column, metric, or pipeline stage is suspected. 1. Ask: *What broke, when did it start, and what changed upstream?* 2. Run the relevant script against the suspect columns only 3. Compare distributions against a known-good baseline if available 4. Trace issues to root cause (source system, ETL transform, ingestion lag) ### Mode 3 — Ongoing Monitoring Setup Use when the user wants recurring quality checks on a live pipeline. 1. Identify the 5–8 critical columns driving key metrics 2. Define thresholds: acceptable null %, outlier rate, valu...

Details

Author
alirezarezvani
Repository
alirezarezvani/claude-skills
Created
7 months ago
Last Updated
3 days ago
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

data-exploration

Profile and explore datasets to understand their shape, quality, and patterns before analysis. Use when encountering a new dataset, assessing data quality, discovering column distributions, identifying nulls and outliers, or deciding which dimensions to analyze.

1 Updated today
Safen99
Data & Documents Solid

data-quality-profiler

Profiles data assets to assess quality dimensions, detect anomalies, and generate comprehensive data quality reports with actionable recommendations.

1,160 Updated today
a5c-ai
AI & Automation Listed

data-validation

QA an analysis before sharing with stakeholders — methodology checks, accuracy verification, and bias detection. Use when reviewing an analysis for errors, checking for survivorship bias, validating aggregation logic, or preparing documentation for reproducibility.

1 Updated today
Safen99
AI & Automation Featured

data-analyst

Data exploration and analysis partner for Product Managers. Use when the user needs to query databases, analyze metrics, create dashboards, or extract insights from data. Triggers include "query", "analyze data", "metrics", "BigQuery", "SQL", "dashboard", "what does the data say", or when working with quantitative information.

2,274 Updated today
jeremylongshore
Data & Documents Listed

dataset-curator

Use this skill when designing, cleaning, deduplicating, or documenting datasets for model training and evaluation including schema design, class imbalance handling, and train/val/test splits. Not for running model training or hyperparameter tuning. Not for real-time data pipeline engineering.

15 Updated 2 days ago
NickCrew