groq-cost-tuning

Featured

Optimize Groq costs through model routing, token management, and usage monitoring. Use when analyzing Groq billing, reducing API costs, or implementing usage monitoring and budget alerts. Trigger with phrases like "groq cost", "groq billing", "reduce groq costs", "groq pricing", "groq expensive", "groq budget".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Groq Cost Tuning ## Overview Optimize Groq inference costs through smart model routing, token minimization, and caching. Groq pricing is already extremely competitive, but at high volume the savings from routing classification to 8B vs 70B are 12x per request. ## Groq Pricing (per million tokens) | Model | Input | Output | |-------|-------|--------| | `llama-3.1-8b-instant` | ~$0.05 | ~$0.08 | | `llama-3.3-70b-versatile` | ~$0.59 | ~$0.79 | | `llama-3.3-70b-specdec` | ~$0.59 | ~$0.99 | | `meta-llama/llama-4-scout-17b-16e-instruct` | ~$0.11 | ~$0.34 | | `whisper-large-v3-turbo` | ~$0.04/hr | — | Check current pricing at [groq.com/pricing](https://groq.com/pricing). ## Instructions ### Step 1: Smart Model Routing ```typescript import Groq from "groq-sdk"; const groq = new Groq(); // Route to cheapest model that meets quality requirements interface ModelConfig { model: string; inputCostPer1M: number; outputCostPer1M: number; } const ROUTING: Record<string, ModelConfig> = { classification: { model: "llama-3.1-8b-instant", inputCostPer1M: 0.05, outputCostPer1M: 0.08 }, extraction: { model: "llama-3.1-8b-instant", inputCostPer1M: 0.05, outputCostPer1M: 0.08 }, summarization: { model: "llama-3.1-8b-instant", inputCostPer1M: 0.05, outputCostPer1M: 0.08 }, reasoning: { model: "llama-3.3-70b-versatile", inputCostPer1M: 0.59, outputCostPer1M: 0.79 }, codeReview: { model: "llama-3.3-70b-versatile", inputCostPer1M: 0.59, outputCostPer1M: 0.79 }, ...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

groq-observability

Set up observability for Groq integrations: latency histograms, token throughput, rate limit gauges, cost tracking, and Prometheus alerts. Trigger with phrases like "groq monitoring", "groq metrics", "groq observability", "monitor groq", "groq alerts", "groq dashboard".

2,266 Updated today
jeremylongshore
AI & Automation Featured

groq-performance-tuning

Optimize Groq API performance with model selection, caching, streaming, and parallel requests. Use when experiencing slow responses, implementing caching strategies, or optimizing request throughput for Groq integrations. Trigger with phrases like "groq performance", "optimize groq", "groq latency", "groq caching", "groq slow", "groq speed".

2,266 Updated today
jeremylongshore
AI & Automation Featured

clade-cost-tuning

Optimize Anthropic API costs — model selection, prompt caching, batches, Use when working with cost-tuning patterns. token reduction, and usage monitoring. Trigger with "anthropic pricing", "claude cost", "reduce anthropic spend", "anthropic billing", "claude cheaper".

2,266 Updated today
jeremylongshore
AI & Automation Featured

langchain-cost-tuning

Optimize LangChain API costs with token tracking, model tiering, caching, prompt compression, and budget enforcement. Trigger: "langchain cost", "langchain tokens", "reduce langchain cost", "langchain billing", "langchain budget", "token optimization".

2,266 Updated today
jeremylongshore
AI & Automation Featured

perplexity-cost-tuning

Optimize Perplexity costs through model routing, caching, token limits, and budget monitoring. Use when analyzing Perplexity billing, reducing API costs, or implementing budget alerts for Perplexity Sonar API. Trigger with phrases like "perplexity cost", "perplexity billing", "reduce perplexity costs", "perplexity pricing", "perplexity budget".

2,266 Updated today
jeremylongshore