← ClaudeAtlas

eval-costlisted

Use when the user asks to make an AI app cheaper or faster, reduce tokens, latency, model/tool/retrieval/rerank/self-check/retry/evaluator cost, or compare cost before/after.
Galileo-Agent-Labs/eval-engineer · ★ 29 · AI & Automation · score 81
Install: claude install-skill Galileo-Agent-Labs/eval-engineer
# Eval Cost Use this skill for tokenomics RCA. Cost changes are accepted only when Galileo quality metrics do not regress. ## Conditional References - Load `skills/eval-engineer/references/tokenomics-rca.md` when choosing the tokenomics workflow or diagnosing why cost moved. - Run `skills/eval-engineer/scripts/compare_tokenomics_packets.py` when both baseline and verification packets exist. - Load `skills/eval-engineer/assets/cost-diagnosis-template.md`, `skills/eval-engineer/assets/tokenomics-fix-plan-template.md`, and `skills/eval-engineer/assets/quality-preserving-verification-template.md` only when writing those artifacts. ## Do - Compare cost, latency, tokens, retrieved context, tool calls, retries, rerank/self-check spans, model spans, and evaluator cost. - Run `compare_tokenomics_packets.py` without explicit quality metrics first when packets use custom quality names, then inspect the inferred `Quality metrics compared` list before accepting the decision. - Treat behavior counters such as handoff count, tool count, step count, retry count, and self-check count as efficiency or workflow evidence, not quality gates by default. Promote one to quality only when the metric profile states the desired direction for that route or segment. - Protect named quality metrics and segment gates. - Reject cheaper candidates when aggregate quality holds but any required segment gate regresses. - Treat lower traffic volume as inconclusive unless per-trace ef