benchmark-extractlisted
Install: claude install-skill gramaton-ai/gramaton
# benchmark-extract
Drives the production session-extraction code path against a benchmark
dataset through Claude Code Agent sub-agents, one sub-agent per unique
haystack session. All writes go to the `gramaton-bench` MCP toolset
(never `gramaton`). Deliberate isolation — see docs/benchmarks.md for
why.
## When to run
- User explicitly requests ingestion of a benchmark dataset (LongMemEval-S,
LongMemEval-M, MuSiQue, etc.).
- Always preceded by a design alignment on subset size (pilot vs full).
Do NOT run autonomously. Extraction spends significant subscription quota
and wall-time; the user drives the cadence.
## Preconditions
1. **Dataset file** exists at the path the user specifies (for
LongMemEval-S: `~/workspaces/gramaton-benchmarks/longmemeval/raw/longmemeval_s_cleaned.json`).
2. **Benchmark store running** on port 7338 with `gramaton-bench` MCP
tools available in this Claude Code session. Verify with a
`mcp__gramaton-bench__gramaton_stats` call; if it fails, stop and ask
the user to start the server per `docs/benchmarks.md`.
3. **Personal store is NOT the target.** Any `mcp__gramaton__*` call in
this skill is a bug.
## Session id convention
Upstream ids (e.g. `sharegpt_yywfIrx_0`, `85a1be56_1`, `answer_280352e9`)
are used verbatim with a dataset prefix: `lme-s-<haystack_session_id>`.
Prefix makes origin unambiguous in the bench store; preserving the
upstream id preserves traceability back to the dataset.
## Flow
### 1. Load and parse
Read the