agent-benchmark
SolidFramework for measuring and tracking agent response quality over time. Detects regressions before they reach production. Use when evaluating agent changes, auditing quality, or establishing performance baselines.
Install
Quality Score: 86/100
Skill Content
Details
- Author
- vibeeval
- Repository
- vibeeval/vibecosystem
- Created
- 2 months ago
- Last Updated
- 1 months ago
- Language
- C#
- License
- MIT
Integrates with
Similar Skills
Semantically similar based on skill content — not just same category
agent-benchmark-suite
Agent skill for benchmark-suite - invoke with $agent-benchmark-suite
agent-performance-benchmarker
Agent skill for performance-benchmarker - invoke with $agent-performance-benchmarker
agent-evaluation
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks
agent-evaluation
Testing and benchmarking LLM agents including behavioral testing, capability assessment, reliability metrics, and production monitoring—where even top agents achieve less than 50% on real-world benchmarks
benchmark-agents
Advanced AI agent benchmark scenarios that push Vercel's cutting-edge platform features — Workflow DevKit, AI Gateway, MCP, Chat SDK, Queues, Flags, Sandbox, and multi-agent orchestration. Designed to stress-test skill injection for complex, multi-system builds.