← ClaudeAtlas

data-explorationlisted

Profile and explore datasets to understand their shape, quality, and patterns before analysis. Use when encountering a new dataset, assessing data quality, discovering column distributions, identifying nulls and outliers, or deciding which dimensions to analyze.
Safen99/opencode-cowork-plugins · ★ 1 · AI & Automation · score 64
Install: claude install-skill Safen99/opencode-cowork-plugins
# Data Exploration Skill Systematic methodology for profiling datasets, assessing data quality, discovering patterns, and understanding schemas. ## Data Profiling Methodology ### Phase 1: Structural Understanding Before analyzing any data, understand its structure: **Table-level questions:** - How many rows and columns? - What is the grain (one row per what)? - What is the primary key? Is it unique? - When was the data last updated? - How far back does the data go? **Column classification:** Categorize each column as one of: - **Identifier**: Unique keys, foreign keys, entity IDs - **Dimension**: Categorical attributes for grouping/filtering (status, type, region, category) - **Metric**: Quantitative values for measurement (revenue, count, duration, score) - **Temporal**: Dates and timestamps (created_at, updated_at, event_date) - **Text**: Free-form text fields (description, notes, name) - **Boolean**: True/false flags - **Structural**: JSON, arrays, nested structures ### Phase 2: Column-Level Profiling For each column, compute: **All columns:** - Null count and null rate - Distinct count and cardinality ratio (distinct / total) - Most common values (top 5-10 with frequencies) - Least common values (bottom 5 to spot anomalies) **Numeric columns (metrics):** ``` min, max, mean, median (p50) standard deviation percentiles: p1, p5, p25, p75, p95, p99 zero count negative count (if unexpected) ``` **String columns (dimensions, text):** ``` min length, max length, avg l