agentic-evallisted
Install: claude install-skill bg-szy/TOP-SKILLS
# Agentic Eval
Use structured evaluation loops to improve important outputs before you call them done.
- Leverage native parallel subagent dispatch and 200k+ context windows where available.
## When to Use
Use symptom -> action triggers: when one matches, apply this skill and verify with the protocol below.
- A task is quality-critical and a single pass is too risky.
- You need repeatable acceptance criteria for code, docs, analysis, or plans.
- You want a reviewer or judge step that is separate from generation.
- You need to compare multiple candidate outputs against the same rubric.
## Core Loop
1. Define the artifact being judged.
2. Define a rubric with weighted dimensions.
3. Generate or collect the candidate output.
4. Evaluate it against the rubric.
5. Convert the feedback into concrete changes.
6. Re-run until the score crosses the threshold or the iteration budget is exhausted.
## Evaluation Patterns
### 1. Self-Reflection
Use the same agent to critique its own work when the task is moderate risk and the rubric is precise.
Best for:
- formatting checks
- completeness checks
- first-pass code or doc refinement
### 2. Evaluator-Optimizer Split
Separate generation from evaluation when you want clearer responsibilities.
Best for:
- high-value outputs
- rubric-based acceptance checks
- comparing multiple candidates fairly
### 3. Evidence-Based Evaluation
Back the score with tests, logs, benchmarks, or direct verification.
Best for:
- code generation
-