eval-harness

Listed

A portable two-layer LLM evaluation pipeline for AI agent outputs — heuristic guard + LLM-as-Judge with Self-Refine retry loop

plugin 0 stars 0 forks Updated 2 months ago

Install

Plugins install via a marketplace, in two steps.

This plugin isn't listed in a marketplace we've indexed. Install it directly from its GitHub repository — the README has the setup steps.

Everything this plugin ships — skills, agents, commands, hooks, and MCP servers it bundles.

judge-agent.md

eval-configure.md eval-report.md eval-run.md

hooks.json scripts

Stars 20%

Recency 20%

Manifest 20%

100

Documentation 15%

Issue Health 10%

License 10%

Description 5%

100