nasde-benchmark-calibrationlisted
Install: claude install-skill NoesisVision/nasde-toolkit
# NASDE Benchmark Calibration
Close the loop between the LLM-as-a-Judge and a human reviewer. The judge scores trials against
`assessment_criteria.md` (per task) and `assessment_dimensions.json` (benchmark-wide) — but an LLM
judge is an imperfect grader, and how it reads the rubric may diverge from how a human grades the code.
This skill publishes trial diffs + scores as Pull/Merge Requests for human review, pulls the comments
back, and proposes concrete rubric edits.
This is the third skill in the benchmark lifecycle: `nasde-benchmark-creator` writes the rubric,
`nasde-benchmark-runner` runs trials and scores them, and **this skill calibrates the rubric** against
human judgment before the benchmark is frozen.
## Prerequisites
- A sink repository that already exists (creation is out of scope). Configure it in `nasde.toml`:
```toml
[calibration]
repo = "https://github.com/Org/nasde-calibration" # full URL or owner/repo slug
# platform = "gitlab" # only needed for a bare slug or a self-hosted host
# base_branch = "main"
# throttle_sec = 2.0
```
- The platform CLI for that repo's host: `gh` (GitHub) or `glab` (GitLab), installed and logged in
(`gh auth login` / `glab auth login`). The platform is auto-detected from the repo URL host. nasde
never handles tokens — the CLI's keyring does. See ADR-010.
- `git` on PATH.
## Workflow
### 1. Decide which trials to calibrate
Calibration is about the **criteria**, not individual runs. Discuss with the use