infer-vault-structurelisted

Four-stage pipeline that takes a corpus of seed content (your existing notes), clusters by semantic similarity, proposes a vault taxonomy via LLM, renders a user-reviewable import plan, then runs a 3-step gate so you apply / edit / skip / abort before any vault writes happen.
peter-claude-vault/claude-stem · ★ 0 · AI & Automation · score 63

Install: claude install-skill peter-claude-vault/claude-stem

# infer-vault-structure A pipeline that turns "a directory of existing notes" into "a vault taxonomy you've reviewed and approved." Four stages — `cluster.sh`, `propose-taxonomy.sh`, `import-plan.sh`, `review-gate.sh` — share stdlib-Python helpers. No `numpy`, `requests`, `scikit-learn`, `pydantic`, or `pyyaml` dependency. Stage 1 (the IR builder under `onboarding/seed-content/`) is upstream of this skill; it produces the JSONL records each row of which is one source file with its content plus metadata. The four stages here turn that IR into the artifacts Stage 3 (`seed-projects`) consumes. ## Personalization tier This is a **Universal capability** — the skill body is identical for every adopter. Personalization comes from the contents of your IR (your seeded files), not from per-user code. Output artifacts (`state/cluster-output.json`, `import-plan.md`, the PRD/Context/Updates triads downstream) carry provenance frontmatter via `lib/provenance-frontmatter.sh`. See [`docs/personalization-model.md`](../../docs/personalization-model.md) for the universal/combined/personal classification. ## When this skill runs Invoked by: - `/onboard --seed-content <path>` — the greenfield personalization path: Section F dispatches the orchestrator after the seven auto-author surfaces complete. - `/adopt --retrofit-existing` — walks an existing populated vault as IR source. - The orchestrator: `skills/infer-vault-structure/orchestrate.sh` chains all four stages with per-stage idempotenc