llm-cost-optimizer

Solid

Use when you need to reduce LLM API spend, control token usage, route between models by cost/quality, implement prompt caching, or build cost observability for AI features. Triggers: 'my AI costs are too high', 'optimize token usage', 'which model should I use', 'LLM spend is out of control', 'implement prompt caching'. NOT for RAG pipeline design (use rag-architect). NOT for prompt writing quality (use senior-prompt-engineer).

AI & Automation 16,642 stars 2295 forks Updated yesterday MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# LLM Cost Optimizer > Originally contributed by [chad848](https://github.com/chad848) — enhanced and integrated by the claude-skills team. You are an expert in LLM cost engineering with deep experience reducing AI API spend at scale. Your goal is to cut LLM costs by 40-80% without degrading user-facing quality -- using model routing, caching, prompt compression, and observability to make every token count. AI API costs are engineering costs. Treat them like database query costs: measure first, optimize second, monitor always. ## Before Starting **Check for context first:** If project-context.md exists, read it before asking questions. Pull the tech stack, architecture, and AI feature details already there. Gather this context (ask in one shot): ### 1. Current State - Which LLM providers and models are you using today? - What is your monthly spend? Which features/endpoints drive it? - Do you have token usage logging? Cost-per-request visibility? ### 2. Goals - Target cost reduction? (e.g., "cut spend by 50%", "stay under $X/month") - Latency constraints? (caching and routing tradeoffs) - Quality floor? (what degradation is acceptable?) ### 3. Workload Profile - Request volume and distribution (p50, p95, p99 token counts)? - Repeated/similar prompts? (caching potential) - Mix of task types? (classification vs. generation vs. reasoning) ## How This Skill Works ### Mode 1: Cost Audit You have spend but no clear picture of where it goes. Instrument, measure, and identi...

Details

Author: alirezarezvani
Repository: alirezarezvani/claude-skills
Created: 7 months ago
Last Updated: yesterday
Language: Python
License: MIT

Integrates with

OpenAI · AI Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

llm-cost-optimizer

Analyze and reduce LLM API costs through model routing, caching, and prompt optimization. TRIGGER when: user asks about LLM costs, API spend reduction, token optimization, model routing, or prompt caching. DO NOT TRIGGER when: user asks about model quality comparison, fine-tuning, or general prompt engineering.

1 Updated 1 weeks ago

DROOdotFOO

AI & Automation Solid

optimizing-prompts

This skill optimizes prompts for Large Language Models (LLMs) to reduce token usage, lower costs, and improve performance. It analyzes the prompt, identifies areas for simplification and redundancy removal, and rewrites the prompt to be more concise and effective. It is used when the user wants to reduce LLM costs, improve response speed, or enhance the quality of LLM outputs by optimizing the prompt. Trigger terms include "optimize prompt", "reduce LLM cost", "improve prompt performance", "rewrite prompt", "prompt optimization".

2,266 Updated today

jeremylongshore

AI & Automation Featured

langchain-cost-tuning

Optimize LangChain API costs with token tracking, model tiering, caching, prompt compression, and budget enforcement. Trigger: "langchain cost", "langchain tokens", "reduce langchain cost", "langchain billing", "langchain budget", "token optimization".

2,266 Updated today

jeremylongshore

AI & Automation Featured

langfuse-cost-tuning

Monitor and optimize LLM costs using Langfuse analytics and dashboards. Use when tracking LLM spending, identifying cost anomalies, or implementing cost controls for AI applications. Trigger with phrases like "langfuse costs", "LLM spending", "track AI costs", "langfuse token usage", "optimize LLM budget".

2,266 Updated today

jeremylongshore

AI & Automation Listed

cost-aware-pipeline

Cost-aware LLM pipeline patterns for optimal model routing, narrow retry strategies, and prompt caching. Reduces API costs 40-70% through intelligent model selection, targeted retries, and cache-friendly prompt structures. Use when: (1) Building multi-model pipelines, (2) Optimizing API costs, (3) Designing retry strategies for LLM calls, (4) Implementing prompt caching, (5) Choosing between haiku/sonnet/opus for sub-tasks.

10 Updated today

stevengonsalvez