agentic-eval

Solid

Patterns for agent self-improvement through iterative evaluation and refinement -- generate, evaluate, critique, refine loops that move beyond single-shot generation.

AI & Automation 3 stars 1 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 82/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Agentic Evaluation Patterns Patterns for self-improvement through iterative evaluation and refinement. ## Overview Evaluation patterns enable agents to assess and improve their own outputs, moving beyond single-shot generation to iterative refinement loops. ``` Generate → Evaluate → Critique → Refine → Output ↑ │ └──────────────────────────────┘ ``` ## When to Use - **Quality-critical generation**: Code, reports, analysis requiring high accuracy - **Tasks with clear evaluation criteria**: Defined success metrics exist - **Content requiring specific standards**: Style guides, compliance, formatting --- ## Pattern 1: Basic Reflection Agent evaluates and improves its own output through self-critique. ```python def reflect_and_refine(task: str, criteria: list[str], max_iterations: int = 3) -> str: """Generate with reflection loop.""" output = llm(f"Complete this task:\n{task}") for i in range(max_iterations): # Self-critique critique = llm(f""" Evaluate this output against criteria: {criteria} Output: {output} Rate each: PASS/FAIL with feedback as JSON. """) critique_data = json.loads(critique) all_pass = all(c["status"] == "PASS" for c in critique_data.values()) if all_pass: return output # Refine based on critique failed = {k: v["feedback"] for k, v in critique_data.items() if v["status"] == "FAIL...

Details

Author: fabioc-aloha
Repository: fabioc-aloha/Alex_Skill_Mall
Created: 3 months ago
Last Updated: yesterday
Language: Python
License: MIT

Integrates with

Azure · Cloud

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

agentic-eval

Patterns and techniques for evaluating and improving AI agent outputs. Use this skill when: - Implementing self-critique and reflection loops - Building evaluator-optimizer pipelines for quality-critical generation - Creating test-driven code refinement workflows - Designing rubric-based or LLM-as-judge evaluation systems - Adding iterative improvement to agent outputs (code, reports, analysis) - Measuring and improving agent response quality

1 Updated yesterday

eric-sabe

AI & Automation Listed

agentic-eval

Use when designing and implementing evaluation loops for AI agents, including reflection, evaluator-optimiser patterns, rubric scoring, LLM-as-judge review, test-driven refinement, convergence checks, and iteration logging.

2 Updated 1 weeks ago

MarieLynneBlock

AI & Automation Listed

graph-engineering

Design resilient multi-step LLM/agent workflows using the loops-and-graphs pattern. Covers all 5 layers: reflection, tool use, planning, multi-agent coordination, and critique loops — with anti-patterns and cost guidance.

0 Updated today

BhaveshKhaple