clean-data-xlslisted
Install: claude install-skill Borjani1577/claude-office-skills
# Clean Data
Clean messy data in the active sheet or a specified range.
## Preflight: Dependency Check
Before starting, verify required libraries are installed and install any that are missing.
```bash
python3 -c "import openpyxl" 2>/dev/null || python3 -m pip install openpyxl
```
**Important**: Do not skip this step — the workflow below will fail without these libraries.
## Environment
- **If running inside Excel (Office Add-in / Office JS):** Use Office JS directly. Read via `range.values`, write helper-column formulas via `range.formulas = [["=TRIM(A2)"]]`. The in-place vs helper-column decision still applies.
- **If operating on a standalone `.xlsx` file:** Use Python and `openpyxl`.
## Workflow
### Step 1: Scope
- If a range is given, such as `A1:F200`, use it.
- Otherwise use the full used range of the active sheet.
- Profile each column: detect its dominant type, text vs number vs date, and identify outliers.
### Step 2: Detect issues
| Issue | What to look for |
|---|---|
| Whitespace | Leading/trailing spaces, double spaces |
| Casing | Inconsistent casing in categorical columns like `usa`, `USA`, `Usa` |
| Number-as-text | Numeric values stored as text; stray `$`, `,`, `%` in number cells |
| Dates | Mixed formats in the same column like `3/8/26`, `2026-03-08`, `March 8 2026` |
| Duplicates | Exact-duplicate rows and near-duplicates caused by case or whitespace differences |
| Blanks | Empty cells in otherwise-populated columns |
| Mixed types | A column