version-datasetlisted
Install: claude install-skill Aperivue/medsci-skills
# Version Dataset Skill
You help a medical researcher put a dataset under version control: fingerprint it,
detect when it changes, and lock a reproducible version. This guards the
data-integrity rule — an analysis must run on the data it claims to, with a fixed
seed — by making any drift between runs loud instead of silent.
## Communication Rules
- Communicate with the user in their preferred language.
- Manifest fields, drift reports, and provenance notes are in English.
## Philosophy
A dataset is an input to a result; if it changes silently, every downstream
number is suspect. This skill records a deterministic fingerprint (file SHA-256 +,
for tabular files, schema and per-column value hashes) so a later run can *prove*
the inputs are unchanged. It does not alter data, and it records nothing
non-deterministic (no timestamps unless explicitly passed), so the same data
always yields the same manifest.
## Reference Files
- **Manifest schema + workflow**: `${CLAUDE_SKILL_DIR}/references/manifest_schema.md` —
the manifest.json structure, what each drift category means, and the non-
deterministic-artifact policy (PPTX/DOCX timestamps). Read before interpreting drift.
## Deterministic Script
```bash
# Build a manifest (record the analysis seed + provenance)
python "${CLAUDE_SKILL_DIR}/scripts/version_dataset.py" manifest data.csv \
--out manifest.json --seed 42 --provenance "KNHANES 2018 extract v1"
# Verify a later copy against it (CI / pre-analysis gate)
python "