agent-arena

Solid

Use when complex AI agent work needs heterogeneous multi-agent debate, red teaming, evidence checking, judging, or synthesis across Codex, Claude Code, Hermes, OpenClaw, and other coding agents.

AI & Automation 21 stars 5 forks Updated today MIT

Install

View on GitHub

Quality Score: 87/100

Stars 20%
45
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
80
License 10%
100
Description 5%
100

Skill Content

# Agent Arena ## Overview Agent Arena is a reusable protocol skill for AI coding agents and LLM agent harnesses. Use it when one agent is likely to be overconfident, trapped in a single framing, or missing evidence. The core idea: **independent heterogeneous agents first, debate later, evidence before consensus, dissent preserved.** Agent Arena is designed for Claude Code, OpenAI Codex, Hermes Agent, OpenClaw, OpenCode, Copilot CLI, and other autonomous coding agents or agentic workflows that support custom skills, custom instructions, or tool-driven delegation. **Capability boundary:** this skill is not an executable orchestrator. It does not install, authenticate, or automatically call external agents. Cross-agent execution requires a host agent or human operator with the relevant CLI/tool access, credentials, permissions, and network availability. ## When to Use Use this skill when the task involves: - Multi-agent debate or panel review - Codex vs Claude Code comparison - Architecture decisions or implementation plan reviews - Complex bug root-cause analysis - PR/code review with high consequence - Research synthesis that needs source checking - LLM-as-a-judge, agent judge, agent game theory, or debate workflows - Red teaming a design, prompt, implementation, benchmark, or experiment plan - Avoiding single-model-family blind spots Do **not** use full Agent Arena for: - Simple factual lookups - Translation, formatting, or summarization - One obvious local tool cal...

Details

Author
zhjai
Repository
zhjai/agent-arena
Created
1 weeks ago
Last Updated
today
Language
N/A
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

agents-consilium

Query external AI agents (Codex, Gemini, OpenCode, Claude Code headless) in parallel for independent second opinions, code review, bug investigation, and consensus on high-stakes decisions. Agents and models are configurable in config.json. Use for architecture choices, security review, or ambiguous problems where independent perspectives matter. Not for simple questions answerable from docs or the codebase — use web search or repo exploration instead.

76 Updated 5 days ago
CodeAlive-AI
AI & Automation Listed

debate-agents

Runs a problem through multiple expert perspectives via debate (agents argue in rounds and converge) or poll (agents analyze independently, then aggregate by consensus). Use to pressure-test a decision or trade-off with no clear winner. Standalone, or invoked by another skill as a sub-routine. Not for implementation (use architect-system) or code verification (use review-work).

10 Updated 2 days ago
hungv47
AI & Automation Solid

ai-agent-design

Use this skill when designing AI agent architectures, implementing tool use, building multi-agent systems, or creating agent memory. Triggers on AI agents, tool calling, agent loops, ReAct pattern, multi-agent orchestration, agent memory, planning strategies, agent evaluation, and any task requiring autonomous AI agent design.

164 Updated today
AbsolutelySkilled
AI & Automation Listed

ai-agent-design

Use this skill when designing AI agent architectures, implementing tool use, building multi-agent systems, or creating agent memory. Triggers on AI agents, tool calling, agent loops, ReAct pattern, multi-agent orchestration, agent memory, planning strategies, agent evaluation, and any task requiring autonomous AI agent design.

3 Updated today
Samuelca6399
AI & Automation Solid

deliberative-analysis

Use when design analysis, experiment planning, architecture choices, research synthesis, or strategy decisions risk overconfidence, tunnel vision, path dependence, premature convergence, or shallow A/B framing.

21 Updated today
zhjai