langchain-cost-tuning

Featured

Optimize LangChain API costs with token tracking, model tiering, caching, prompt compression, and budget enforcement. Trigger: "langchain cost", "langchain tokens", "reduce langchain cost", "langchain billing", "langchain budget", "token optimization".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# LangChain Cost Tuning ## Overview Reduce LLM API costs while maintaining quality: token tracking callbacks, model tiering (route simple tasks to cheap models), caching for duplicate queries, prompt compression, and budget enforcement. ## Current Pricing Reference (2026) | Provider | Model | Input $/1M | Output $/1M | |----------|-------|-----------|------------| | OpenAI | gpt-4o | $2.50 | $10.00 | | OpenAI | gpt-4o-mini | $0.15 | $0.60 | | Anthropic | claude-sonnet | $3.00 | $15.00 | | Anthropic | claude-haiku | $0.25 | $1.25 | | OpenAI | text-embedding-3-small | $0.02 | - | ## Strategy 1: Token Usage Tracking ```typescript import { BaseCallbackHandler } from "@langchain/core/callbacks/base"; const MODEL_PRICING: Record<string, { input: number; output: number }> = { "gpt-4o": { input: 2.5, output: 10.0 }, "gpt-4o-mini": { input: 0.15, output: 0.6 }, }; class CostTracker extends BaseCallbackHandler { name = "CostTracker"; totalCost = 0; totalTokens = 0; calls = 0; handleLLMEnd(output: any) { this.calls++; const usage = output.llmOutput?.tokenUsage; if (!usage) return; const model = "gpt-4o-mini"; // extract from output metadata const pricing = MODEL_PRICING[model] ?? MODEL_PRICING["gpt-4o-mini"]; const inputCost = (usage.promptTokens / 1_000_000) * pricing.input; const outputCost = (usage.completionTokens / 1_000_000) * pricing.output; this.totalTokens += usage.totalTokens; this.totalCost += inputCost + outputCos...

Details

Author: jeremylongshore
Repository: jeremylongshore/claude-code-plugins-plus-skills
Created: 7 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

OpenAI · AI Anthropic · AI LangChain · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

langfuse-cost-tuning

Monitor and optimize LLM costs using Langfuse analytics and dashboards. Use when tracking LLM spending, identifying cost anomalies, or implementing cost controls for AI applications. Trigger with phrases like "langfuse costs", "LLM spending", "track AI costs", "langfuse token usage", "optimize LLM budget".

2,266 Updated today

jeremylongshore

AI & Automation Featured

clade-cost-tuning

Optimize Anthropic API costs — model selection, prompt caching, batches, Use when working with cost-tuning patterns. token reduction, and usage monitoring. Trigger with "anthropic pricing", "claude cost", "reduce anthropic spend", "anthropic billing", "claude cheaper".

2,266 Updated today

jeremylongshore

AI & Automation Listed

llm-cost-optimizer

Analyze and reduce LLM API costs through model routing, caching, and prompt optimization. TRIGGER when: user asks about LLM costs, API spend reduction, token optimization, model routing, or prompt caching. DO NOT TRIGGER when: user asks about model quality comparison, fine-tuning, or general prompt engineering.

1 Updated 1 weeks ago

DROOdotFOO

AI & Automation Featured

anth-cost-tuning

Optimize Anthropic Claude API costs with model routing, prompt caching, batching, and spend monitoring. Use when analyzing Claude API billing, reducing costs, or implementing cost controls and budget alerts. Trigger with phrases like "anthropic cost", "claude billing", "reduce claude spend", "anthropic budget", "claude pricing optimize".

2,266 Updated today

jeremylongshore

AI & Automation Solid

llm-cost-optimizer

Use when you need to reduce LLM API spend, control token usage, route between models by cost/quality, implement prompt caching, or build cost observability for AI features. Triggers: 'my AI costs are too high', 'optimize token usage', 'which model should I use', 'LLM spend is out of control', 'implement prompt caching'. NOT for RAG pipeline design (use rag-architect). NOT for prompt writing quality (use senior-prompt-engineer).

16,642 Updated yesterday

alirezarezvani