bare-eval

Solid

Run isolated eval and grading calls using CC 2.1.81 --bare mode. Constructs claude -p --bare invocations for skill evaluation, trigger testing, and LLM grading without plugin/hook interference. Use when running eval pipelines, grading skill outputs, benchmarking prompt quality, or testing trigger accuracy in isolation.

AI & Automation 208 stars 20 forks Updated today MIT

Install

View on GitHub

Quality Score: 88/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Bare Eval — Isolated Evaluation Calls Run `claude -p --bare` for fast, clean eval/grading without plugin overhead. **CC 2.1.81 required.** The `--bare` flag skips hooks, LSP, plugin sync, and skill directory walks. ## When to Use - Grading skill outputs against assertions - Trigger classification (which skill matches a prompt) - Description optimization iterations - Any scripted `-p` call that doesn't need plugins ## When NOT to Use - Testing skill routing (needs `--plugin-dir`) - Testing agent orchestration (needs full plugin context) - Interactive sessions ## Prerequisites ```bash # --bare requires ANTHROPIC_API_KEY (OAuth/keychain disabled) export ANTHROPIC_API_KEY="sk-ant-..." # Verify CC version claude --version # Must be >= 2.1.81 ``` ## Quick Reference | Call Type | Command Pattern | |-----------|----------------| | Grading | `claude -p "$prompt" --bare --max-turns 1 --output-format text` | | Trigger | `claude -p "$prompt" --bare --json-schema "$schema" --output-format json` | | Streaming grade | `claude -p "$prompt" --bare --max-turns 1 --output-format stream-json` | | Optimize | `echo "$prompt" \| claude -p --bare --max-turns 1 --output-format text` | | Force-skill | `claude -p "$prompt" --bare --print --append-system-prompt "$content"` | | @-file in prompt | `claude -p "grade @fixtures/case-1.md against rubric" --bare` (CC 2.1.113 Remote Control autocomplete) | > **Long harness runs (CC 2.1.199+):** set `CLAUDE_CODE_RETRY_WATCHDOG=1` for unattended ev...

Details

Author: yonatangross
Repository: yonatangross/orchestkit
Created: 6 months ago
Last Updated: today
Language: TypeScript
License: MIT

Integrates with

Anthropic · AI LangChain · AI FastAPI · Backend

Similar Skills

Semantically similar based on skill content — not just same category

Web & Frontend Listed

eval-harness

Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD). Use this skill whenever defining success criteria for a feature before building it, writing capability / regression / quality evals, measuring pass@k reliability, or setting up eval-driven checks to catch regressions across changes.

0 Updated yesterday

sardonyx0827

AI & Automation Solid

eval-harness

Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles

5 Updated today

immacualate

AI & Automation Solid

eval-harness

Formal evaluation framework for Claude Code sessions implementing eval-driven development (EDD) principles

159 Updated 1 weeks ago

arabicapp