mkbenchmark

Install

View on GitHub

Quality Score: 86/100

Stars 20%

40

Recency 20%

100

Frontmatter 20%

70

Documentation 15%

100

Issue Health 10%

50

License 10%

100

Description 5%

100

Skill Content

# mk:benchmark — Experimental Harness Canary Suite Measures harness performance against a small set of ground-truth tasks. Provides the empirical signal that the dead-weight audit (per `.claude/rules/dead-weight-audit-rules.md`) consumes to make load-bearing decisions about each harness component. ## When to Use Activate when: - User runs `/mk:benchmark run` (default = quick tier, 5 tasks, ≤$5) - User runs `/mk:benchmark run --full` (quick tier + 1 heavy task, ≤$30) - User runs `/mk:benchmark compare <run-id-a> <run-id-b>` (delta table) - Before applying a harness change (baseline) - After applying a harness change (verify delta) - During the dead-weight audit playbook (component enable/disable cycles) Skip when: - The harness has been run end-to-end manually within the last hour (use that data instead) - Budget cap is hit before the suite finishes (record partial result, alert) ## Hard Constraints 1. **Quick tier ≤$5 total cost.** Hard block if projected cost exceeds. 2. **Full tier ≤$30 total cost.** Hard block if projected cost exceeds. 3. **`--full` is opt-in.** The heavy task (`06-small-app-build`) requires explicit `--full` flag because it triggers `mk:autobuild` which can run for hours. Refuses to run without the flag. 4. **NOT a replacement for unit tests.** This is harness-level measurement only. 5. **Results recorded in trace-log.jsonl** as `event=benchmark_result` records, tagged with `benchmark_version` + `harness_version` + `model_version`. ## Subcommands ...

Details

Author: ngocsangyem
Repository: ngocsangyem/MeowKit
Created: 4 months ago
Last Updated: today
Language: TypeScript
License: MIT

Install

Quality Score: 86/100

Skill Content

Details

Bundled in these plugins

Similar Skills

dotnet-benchmark

time-benchmark

infrastructure-benchmark