evallisted
Install: claude install-skill rillmd/rill
# /eval — Exploration Benchmark (monthly. Verify that note structure actually works in practice)
**Conduct ALL conversation with the user in the language defined by `.claude/rules/personal-language.md`** (or the user's input language if absent). The English instructions below are for skill clarity, not for output style. Exceptions (only): tokens inside backticks or code blocks, proper nouns, ASCII acronyms.
> **Tool references in this skill** (`sub-agent`, `shell`, `Read`, `Edit`, `Glob`, `Grep`) describe **intent**, not Claude-specific tool calls. Each harness should map them to its native equivalent — Claude Code uses its built-in Agent / Bash / Read / Edit tools as named; Codex CLI uses CSV fan-out / shell / `apply_patch` etc. as appropriate.
> Workflow: `/inspect` -> `/repair` -> `/eval` (inspect -> repair -> verify)
Quantitatively evaluate the search quality and efficiency of the knowledge base through the AI agent's natural exploration (ADR-050). See `eval/concept.md` for metric definitions.
## Arguments
$ARGUMENTS — Optional. Can be combined.
- `--reuse-deep YYYY-MM-DD` — Reuse Deep Path results from the specified date (`eval/results/YYYY-MM-DD-deep.yaml`)
- `--refresh-deep` — Force regeneration of Deep Path for all queries (high cost)
- `--baseline` — Also copy results to `eval/baseline.yaml`
- `--compare YYYY-MM-DD` — Compare with results from the specified date and report differences
- No arguments — Auto-detect the latest Deep Path. Incrementally update if n