calibratelisted
Install: claude install-skill Borda/AI-Rig
# Calibrate
Run a linear calibration loop for codex workflow integrity and behavioral scoring.
## Input Schema
```json
{
"scope": "skills|agents|routing|all",
"pace": "fast|full",
"mode": "ab-test|apply",
"skip_gate": false,
"done_when": "recall and bias scores emitted; proposals written if mode=apply; gate skipped if skip_gate=true"
}
```
## Workflow
01. Load calibration task set from `.codex/calibration/tasks.json`.
02. Load behavioral cases from `.codex/calibration/behavioral-cases.json`.
03. Load behavioral observations from `.codex/calibration/behavioral-observations.jsonl`.
- Require `source`, `run_id`, and `observed_at` on each observation where available.
04. Run `.codex/calibration/run.sh`.
05. Inspect `checks_failed`, `leaks_found`, and `behavioral`.
06. Review behavioral metrics:
- `recall`: expected finding IDs recovered from known cases.
- `precision`: reported finding IDs that match expected finding IDs.
- `confidence_accuracy`: `1 - mean(abs(confidence - per-case F1))`.
- `mean_overconfidence`: average positive confidence bias over per-case F1.
- `gate_metrics_raw`: unrounded overall values used for pass/fail thresholds.
- `by_source`: recall, precision, and confidence calibration grouped by observation source.
- `observation_freshness`: latest `observed_at`, missing timestamp count, and live-vs-fixture observation counts.
07. Classify gaps as blocking or non-blocking.
08. Emit measured recommendations for what sho