harness-design

Solid

Design and build multi-agent harness architectures for long-running AI application development. GAN-inspired Generator-Evaluator pattern, Sprint Contract negotiation, context management, quality criteria calibration. Based on Anthropic Engineering patterns. Use when: "build a harness", "multi-agent architecture", "agent orchestration", "generator-evaluator", "long-running app", "harness design", "agent pipeline", "quality evaluation loop", "sprint contract", "build app with agents", "Claude Agent SDK architecture", or when building complex full-stack apps that need planning → generation → evaluation cycles. Also use when discussing context degradation, self-evaluation bias, or assumption testing in AI workflows. Do NOT use to stress-test or critique an already-written plan document; use plan-swarm-review for that (this skill designs the harness, it does not review plans).

AI & Automation 138 stars 21 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 90/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Multi-Agent Harness Design Источники: - Anthropic Engineering — "Harness design for long-running apps" - OpenClaw-RL paper (arxiv 2603.10165) — personal agent verification - DenisSergeevitch/repo-task-proof-loop — execution protocol with durable proof См. также: `references/proof-loop-research.md` — детали paper + repo mapping ## Когда нужен harness, а когда хватит solo agent | Сигнал | Solo agent | Harness | |--------|-----------|---------| | Scope | Одна фича, bug fix, refactor | Full-stack app, multi-feature product | | Длительность | < 30 мин | 1-6+ часов | | Качество | Baseline достаточно | Нужен polish, originality, craft | | Стоимость | ~$5-15 | ~$100-200+ | | Проверка | Manual review | Automated evaluation + Playwright | **Правило:** Evaluator оправдан когда задача **за пределами reliable solo performance**. Не фиксированное yes/no — зависит от complexity tier. --- ## Архитектура: Three-Agent System ### 1. Planner (Планировщик) - Расширяет 1-4 предложения пользователя в **детальную спецификацию** - Амбициозный scope — находит возможности для AI-фич - **НЕ** over-specify реализацию — только what, не how - Вписывает AI features в продукт органично ### 2. Generator (Генератор) - Реализует фичи итеративно - Включает **self-evaluation** перед handoff (но она ненадёжна — см. ниже) - Работает в рамках Sprint Contract ### 3. Evaluator (Оценщик) - **Независимый** от генератора — отдельный контекст, отдельный промпт - Валидирует через Playwright MCP — скриншоты, нав...

Details

Author: AnastasiyaW
Repository: AnastasiyaW/claude-code-config
Created: 4 months ago
Last Updated: yesterday
Language: Python
License: MIT

Integrates with

Anthropic · AI Playwright · Testing

Bundled in these plugins

claude-code-config

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

neo-agent-harness

Use this skill when the user asks to improve AI-assisted development reliability, AGENTS.md, skills, tests, CI, hooks, review loops, agent workflow governance, loop engineering, agent automations, worktree isolation, maker/checker separation, or external state design for long-running agents. It designs feedforward guides, feedback sensors, verification gates, human decision points, and loop architectures from repository evidence.

7 Updated yesterday

Benknightdark

AI & Automation Listed

harness-eng

Use when designing, evaluating, or simplifying an agent project harness: AGENTS.md/CLAUDE.md rules, startup scripts, progress logs, feature trackers, handoffs, evaluator rubrics, quality documents, repo-local knowledge maps, and mechanical guardrails for coding agents. Especially useful when converting raw agent-workflow notes into a concise, verifiable project control layer.

3 Updated yesterday

MasihMoafi

AI & Automation Listed

deepagents-harness

Use when designing long-horizon agents that need subagents, isolated context windows, filesystem-backed work, persistence, human approval, MCP tools, or production tracing.

8 Updated today

mouadja02