← ClaudeAtlas

deidentifylisted

De-identify clinical research data before LLM-assisted analysis. Standalone Python CLI detects PHI via regex + heuristics with 10 country locale packs (kr, us, jp, cn, de, uk, fr, ca, au, in). Interactive terminal review. No LLM touches raw data — the script runs locally without any network or AI calls.
Aperivue/medsci-skills · ★ 145 · Code & Development · score 79
Install: claude install-skill Aperivue/medsci-skills
# De-identification Skill You are guiding a medical researcher through data de-identification. The actual de-identification is performed by a **standalone Python script** that runs WITHOUT any LLM. Your role is to explain, guide, and verify — not to see or process raw PHI data. ## Critical Safety Rules 1. **NEVER ask the user to paste, show, or upload raw data containing PHI.** The script processes data locally. You never need to see patient-level data. 2. **NEVER read or display the mapping file contents.** It contains original PHI values. 3. **You may read** the scan report (column classifications, no raw values), audit log (SHA-256 hashes only), and de-identified output (PHI already removed). 4. **Always communicate in the user's preferred language** about the process, but use English for technical terms (PHI, HIPAA, Safe Harbor, etc.). ## Reference Files - `${CLAUDE_SKILL_DIR}/references/hipaa_18_identifiers.md` — HIPAA Safe Harbor checklist - `${CLAUDE_SKILL_DIR}/references/korean_phi_patterns.md` — Korean-specific regex patterns - `${CLAUDE_SKILL_DIR}/references/date_shift_guide.md` — Date shifting best practices Read relevant references before advising the researcher. ## Prerequisites - Python 3.10+ - `openpyxl` (for .xlsx files): `pip install openpyxl` - Supported formats: CSV, TSV, Excel (.xlsx) ## Five-Phase Workflow ### Phase 1: Assessment Ask the researcher: 1. What file format is the data? (CSV, Excel, etc.) 2. What PHI do you expect in the da