← ClaudeAtlas

contribute-evallisted

Turn a skill failure into a regression test by contributing a new eval case (prompt fixture + golden) to the ai-kit eval suite. Use when user says "this skill failed", "ai-kit got this wrong", "add a test for this", "contribute eval", "/ai:contribute-eval", or whenever an ai-kit skill output is materially worse than the user expected and the user wants the fix to stick.
yusufkaracaburun/ai-kit · ★ 0 · AI & Automation · score 56
Install: claude install-skill yusufkaracaburun/ai-kit
# Contribute eval Capture one ai-kit skill failure as a structured eval case — a prompt fixture plus a golden rubric — and open a PR against `yusufkaracaburun/ai-kit` adding both files. Every contributed case becomes a regression test the next release must pass. This is how ai-kit quality compounds at < 50 users: failures surfaced once stay fixed. ## Not this skill - **Bug in ai-kit's installer / scripts** — file a `bug` issue via `gh issue create --template bug.yml`. Eval cases test skill *behaviour*, not script defects. - **General feedback ("this felt clunky")** — use `/ai:feedback` instead. Eval cases are for concrete failures with a reproducible prompt and an articulable "what should have happened". - **A skill the user has never invoked** — eval cases require a real prompt + actual output. If both are hypothetical, file as `skill-suggestion` so the design discussion happens first. If the user's input matches one of the above, route them and stop. ## Process ### 1. Pick the target skill + scenario name Ask the user (or infer from context): - **Skill name** — exact dir name under `workflow/skills/`. Validate it exists: `gh api repos/yusufkaracaburun/ai-kit/contents/workflow/skills/<name>` returns 200. - **Scenario name** — short kebab-case slug describing the case (e.g. `missing-package-json`, `concurrent-edit-conflict`). Reject duplicates: check `gh api repos/yusufkaracaburun/ai-kit/contents/tests/eval/prompts/<skill>/<scenario>.md` returns 404 before continuing.