cost-aware-pipelinelisted

Cost-aware LLM pipeline patterns for optimal model routing, narrow retry strategies, and prompt caching. Reduces API costs 40-70% through intelligent model selection, targeted retries, and cache-friendly prompt structures. Use when: (1) Building multi-model pipelines, (2) Optimizing API costs, (3) Designing retry strategies for LLM calls, (4) Implementing prompt caching, (5) Choosing between haiku/sonnet/opus for sub-tasks.
stevengonsalvez/agents-in-a-box · ★ 10 · AI & Automation · score 70

Install: claude install-skill stevengonsalvez/agents-in-a-box

# Cost-Aware LLM Pipeline ## Model Routing Strategy Route tasks to the cheapest model that can handle them reliably. ### Pricing Reference (per 1M tokens, USD) | Model | Input | Output | Relative Cost | |-------|-------|--------|---------------| | Haiku | $0.80 | $4.00 | 1x (baseline) | | Sonnet | $3.00 | $15.00 | ~4x | | Opus | $15.00 | $75.00 | ~19x | ### Routing Rules | Task Complexity | Route To | Examples | |-----------------|----------|----------| | **Simple** (< 100 lines, clear pattern) | Haiku | File renaming, simple search, format conversion, status checks | | **Moderate** (100-500 lines, some judgment) | Sonnet | Code review, test writing, refactoring, documentation | | **Complex** (500+ lines, deep reasoning) | Opus | Architecture design, debugging subtle issues, multi-file refactoring | | **Creative** (open-ended, high quality bar) | Opus | System design, novel algorithms, critical security review | ### Implementation with Claude Code Agent Tool ```python # In agent orchestration, specify model based on task complexity def route_to_model(task_description: str, estimated_complexity: str) -> str: """Return model parameter for Agent tool.""" routing = { "simple": "haiku", "moderate": "sonnet", "complex": "opus", "creative": "opus", } return routing.get(estimated_complexity, "sonnet") ``` When spawning sub-agents: - Use `model: "haiku"` for Explore agents doing simple file searches - Use `model: "sonnet"` (defa