← ClaudeAtlas

cost-quality-frontierlisted

Adds cost (input + output tokens × model price) and latency (p50, p95) to eval results, plots model options on a Pareto frontier, and produces a quality-per-dollar composite score so production model selection is grounded in trade-offs, not just quality. Use when: model comparison, cost-aware evals, latency budget, quality-per-dollar, Pareto frontier, model selection, eval economics, picking a model for production.
varunk130/AI-Eval-Skills · ★ 1 · AI & Automation · score 74
Install: claude install-skill varunk130/AI-Eval-Skills
# Cost-Quality Frontier Most eval suites today answer the question *"which model is most accurate?"* The production question is harder: *"which model gives me the best quality I can afford, at the latency budget my product allows?"* This skill takes existing eval runs and augments each result with cost and latency, then plots the candidates on a Pareto frontier so the trade-off is visible at a glance. ## Core Principle **Quality, cost, and latency are a single decision, not three.** A model that is 2 points more accurate but costs 6× and adds 800ms of p95 latency is rarely the right pick. A model that is 1 point worse but cheaper *and* faster usually is - but only if you can see the frontier. This skill makes the frontier explicit. --- ## What You'll Get | Artifact | Description | |----------|-------------| | **Augmented Eval Schema** | Every result row gains `cost_usd`, `input_tokens`, `output_tokens`, `latency_ms`, `model`, `pricing_source` columns | | **Per-Model Aggregates** | Quality (mean + 95% CI), p50/p95 latency, cost-per-eval-run, cost-per-1k-runs, total tokens consumed | | **Pareto Frontier Plot** | ASCII / Markdown table of which models are non-dominated on (quality, cost) and (quality, latency) | | **Quality-per-Dollar Score** | Single composite: `quality_score / cost_per_1k_runs`, with a 95% CI from bootstrap | | **Decision Matrix** | "If your latency budget is X and your cost ceiling is Y, the best model is Z" - for several common (X, Y) regimes | | **Sen