eval-harness

Listed

A portable two-layer LLM evaluation pipeline for AI agent outputs — heuristic guard + LLM-as-Judge with Self-Refine retry loop

plugin 0 stars 0 forks Updated 2 months ago

Install

Plugins install via a marketplace, in two steps.

This plugin isn't listed in a marketplace we've indexed. Install it directly from its GitHub repository — the README has the setup steps.

View on GitHub

Bundles

Everything this plugin ships — skills, agents, commands, hooks, and MCP servers it bundles.

Agents (1)

judge-agent.md

Commands (3)

eval-configure.md eval-report.md eval-run.md

Hooks (2)

hooks.json scripts

Quality Score: 56/100

Stars 20%
0
Recency 20%
75
Manifest 20%
100
Documentation 15%
0
Issue Health 10%
80
License 10%
0
Description 5%
100

Details

Author
jamjahal
Repository
jamjahal/verdict
Created
3 months ago
Last Updated
2 months ago
Language
Python
License
None