← ClaudeAtlas

agentic-evallisted

Evaluate and improve AI-generated output with explicit rubrics, reflection loops, and stop conditions. Use when building self-critique workflows, evaluator-optimizer pipelines, or acceptance gates for code, docs, analysis, or plans.
bg-szy/TOP-SKILLS · ★ 1 · AI & Automation · score 70
Install: claude install-skill bg-szy/TOP-SKILLS
# Agentic Eval Use structured evaluation loops to improve important outputs before you call them done. - Leverage native parallel subagent dispatch and 200k+ context windows where available. ## When to Use Use symptom -> action triggers: when one matches, apply this skill and verify with the protocol below. - A task is quality-critical and a single pass is too risky. - You need repeatable acceptance criteria for code, docs, analysis, or plans. - You want a reviewer or judge step that is separate from generation. - You need to compare multiple candidate outputs against the same rubric. ## Core Loop 1. Define the artifact being judged. 2. Define a rubric with weighted dimensions. 3. Generate or collect the candidate output. 4. Evaluate it against the rubric. 5. Convert the feedback into concrete changes. 6. Re-run until the score crosses the threshold or the iteration budget is exhausted. ## Evaluation Patterns ### 1. Self-Reflection Use the same agent to critique its own work when the task is moderate risk and the rubric is precise. Best for: - formatting checks - completeness checks - first-pass code or doc refinement ### 2. Evaluator-Optimizer Split Separate generation from evaluation when you want clearer responsibilities. Best for: - high-value outputs - rubric-based acceptance checks - comparing multiple candidates fairly ### 3. Evidence-Based Evaluation Back the score with tests, logs, benchmarks, or direct verification. Best for: - code generation -