cost-aware-llm-pipelinelisted

Cost optimization patterns for LLM API usage — model routing by task complexity, budget tracking, retry logic, and prompt caching.
SilantevBitcoin/Base-system-Claude · ★ 1 · AI & Automation · score 74

Install: claude install-skill SilantevBitcoin/Base-system-Claude

# Cost-Aware LLM Pipeline Patterns for controlling LLM API costs while maintaining quality. Combines model routing, budget tracking, retry logic, and prompt caching into a composable pipeline. ## When to Activate - Building applications that call LLM APIs (Claude, GPT, etc.) - Processing batches of items with varying complexity - Need to stay within a budget for API spend - Optimizing cost without sacrificing quality on complex tasks ## Core Concepts ### 1. Model Routing by Task Complexity Automatically select cheaper models for simple tasks, reserving expensive models for complex ones. ```python MODEL_SONNET = "claude-sonnet-4-6" MODEL_HAIKU = "claude-haiku-4-5-20251001" _SONNET_TEXT_THRESHOLD = 10_000 # chars _SONNET_ITEM_THRESHOLD = 30 # items def select_model( text_length: int, item_count: int, force_model: str | None = None, ) -> str: """Select model based on task complexity.""" if force_model is not None: return force_model if text_length >= _SONNET_TEXT_THRESHOLD or item_count >= _SONNET_ITEM_THRESHOLD: return MODEL_SONNET # Complex task return MODEL_HAIKU # Simple task (3-4x cheaper) ``` ### 2. Immutable Cost Tracking Track cumulative spend with frozen dataclasses. Each API call returns a new tracker — never mutates state. ```python from dataclasses import dataclass @dataclass(frozen=True, slots=True) class CostRecord: model: str input_tokens: int output_tokens: int cost_usd: float @datac