ln-840-benchmark-compare

Solid

Runs a canonical built-in vs hex-line benchmark with scenario manifests, activation checks, and diff-based correctness. Use when measuring hex-line MCP performance against Claude built-in tools or when preparing the canonical suite for later external baselines.

AI & Automation 488 stars 70 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 94/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

> **Paths:** File paths (`shared/`, `references/`) are relative to skills repo root. Locate this SKILL.md directory and go up one level for repo root. # Benchmark Compare **Type:** L3 Worker **Category:** 8XX Optimization -> 840 Benchmark Run a clean A/B benchmark in Claude Code: one session with built-in tools only, one with `hex-line`. The benchmark is scenario-based, diff-validated, manifest-driven, and runtime-backed. It measures activation, correctness, time, cost, and tokens. The current runner is intentionally scoped to this internal A/B. It does not, by itself, prove best-in-class against external alternatives. --- ## Input / Output | Direction | Content | |-----------|----------| | **Input** | Repo checkout containing `mcp/hex-line-mcp/`, optional `references/goals.md`, optional `references/expectations.json` | | **Output** | Comparison report in `skills-catalog/ln-840-benchmark-compare/results/{date}-comparison.md` plus machine-readable benchmark summary artifact | --- ## Prerequisites - `claude --version` succeeds - `git` succeeds - `mcp/hex-line-mcp/server.mjs` exists - `mcp/hex-line-mcp/hook.mjs` exists - `skills-catalog/ln-840-benchmark-compare/references/goals.md` exists - `skills-catalog/ln-840-benchmark-compare/references/expectations.json` exists - `skills-catalog/ln-840-benchmark-compare/references/mcp-bench.json` exists --- ## Quick Run ```bash bash skills-catalog/ln-840-benchmark-compare/scripts/run-benchmark.sh \ [skills-catalog/ln-840-benc...

Details

Author: levnikolaevich
Repository: levnikolaevich/claude-code-skills
Created: 7 months ago
Last Updated: yesterday
Language: JavaScript
License: MIT

Integrates with

Anthropic · AI Model Context Protocol · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

benchmark

Compare Claude Code output with full config vs minimal config using standardized tasks per stack.

7 Updated 1 weeks ago

luiseiman

AI & Automation Listed

skill-benchmarking

Run skill benchmarks with discriminating-only assertions against evals.json for any model and any AI agent. Use when benchmarking a skill against a model not yet tested, running with_skill/without_skill eval pairs, producing benchmark-<model>.json, re-grading an existing run, adding Phase 2 model comparison results, reviewing results in the eval viewer, updating README benchmark tables, or cleaning non-discriminating assertions from evals.json. Enforces strict grader isolation (the context that generates responses never grades them) and evidence-only passing (assertions pass only on explicit content, never on implication or charity). Works with Claude Code, Gemini CLI, GitHub Copilot, Cursor, and any other AI coding assistant.

1 Updated today

christim427-rgb

AI & Automation Listed

mkbenchmark

Use when measuring harness changes against ground truth — runs a small canary suite (5 quick tasks, 6 with --full) and records scores in trace-log.jsonl. Backs the dead-weight audit with measured deltas. Triggers on /mk:benchmark, "run benchmark", "measure harness", or before/after a harness change.

15 Updated yesterday

ngocsangyem