cost-aware-llm-pipeline

Solid

LLM API 使用成本优化模式 —— 基于任务复杂度的模型路由、预算跟踪、重试逻辑和提示缓存。

AI & Automation 196,640 stars 30253 forks Updated 2 days ago MIT

Install

View on GitHub

Quality Score: 95/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# 成本感知型 LLM 流水线在保持质量的同时控制 LLM API 成本的模式。将模型路由、预算跟踪、重试逻辑和提示词缓存组合成一个可组合的流水线。 ## 何时激活 * 构建调用 LLM API（Claude、GPT 等）的应用程序时 * 处理具有不同复杂度的批量项目时 * 需要将 API 支出控制在预算范围内时 * 需要在复杂任务上优化成本而不牺牲质量时 ## 核心概念 ### 1. 根据任务复杂度进行模型路由自动为简单任务选择更便宜的模型，为复杂任务保留昂贵的模型。 ```python MODEL_SONNET = "claude-sonnet-4-6" MODEL_HAIKU = "claude-haiku-4-5-20251001" _SONNET_TEXT_THRESHOLD = 10_000 # chars _SONNET_ITEM_THRESHOLD = 30 # items def select_model( text_length: int, item_count: int, force_model: str | None = None, ) -> str: """Select model based on task complexity.""" if force_model is not None: return force_model if text_length >= _SONNET_TEXT_THRESHOLD or item_count >= _SONNET_ITEM_THRESHOLD: return MODEL_SONNET # Complex task return MODEL_HAIKU # Simple task (3-4x cheaper) ``` ### 2. 不可变的成本跟踪使用冻结的数据类跟踪累计支出。每个 API 调用都会返回一个新的跟踪器 —— 永不改变状态。 ```python from dataclasses import dataclass @dataclass(frozen=True, slots=True) class CostRecord: model: str input_tokens: int output_tokens: int cost_usd: float @dataclass(frozen=True, slots=True) class CostTracker: budget_limit: float = 1.00 records: tuple[CostRecord, ...] = () def add(self, record: CostRecord) -> "CostTracker": """Return new tracker with added record (never mutates self).""" return CostTracker( budget_limit=self.budget_limit, records=(*self.records, record), ) @property def total_cost(self) -> float: ...

Details

Author: affaan-m
Repository: affaan-m/everything-claude-code
Created: 4 months ago
Last Updated: 2 days ago
Language: JavaScript
License: MIT

Integrates with

Anthropic · AI

Related Skills

AI & Automation Featured

videodb

See, Understand, Act on video and audio. See- ingest from local files, URLs, RTSP/live feeds, or live record desktop; return realtime context and playable stream links. Understand- extract frames, build visual/semantic/temporal indexes, and search moments with timestamps and auto-clips. Act- transcode and normalize (codec, fps, resolution, aspect ratio), perform timeline edits (subtitles, text/image overlays, branding, audio overlays, dubbing, translation), generate media assets (image, audio, video), and create real time alerts for events from live streams or desktop capture.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

ck

Persistent per-project memory for Claude Code. Auto-loads project context on session start, tracks sessions with git activity, and writes to native memory. Commands run deterministic Node.js scripts — behavior is consistent across model versions.

196,640 Updated 2 days ago

affaan-m

AI & Automation Featured

browser

Web browser automation with AI-optimized snapshots for claude-flow agents

55,973 Updated today

ruvnet