groq-cost-tuning

Featured

Optimize Groq costs through model routing, token management, and usage monitoring. Use when analyzing Groq billing, reducing API costs, or implementing usage monitoring and budget alerts. Trigger with phrases like "groq cost", "groq billing", "reduce groq costs", "groq pricing", "groq expensive", "groq budget".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Groq Cost Tuning ## Overview Optimize Groq inference costs through smart model routing, token minimization, and caching. Groq pricing is already extremely competitive, but at high volume the savings from routing classification to 8B vs 70B are 12x per request. ## Groq Pricing (per million tokens) | Model | Input | Output | |-------|-------|--------| | `llama-3.1-8b-instant` | ~$0.05 | ~$0.08 | | `llama-3.3-70b-versatile` | ~$0.59 | ~$0.79 | | `llama-3.3-70b-specdec` | ~$0.59 | ~$0.99 | | `meta-llama/llama-4-scout-17b-16e-instruct` | ~$0.11 | ~$0.34 | | `whisper-large-v3-turbo` | ~$0.04/hr | — | Check current pricing at [groq.com/pricing](https://groq.com/pricing). ## Instructions ### Step 1: Smart Model Routing ```typescript import Groq from "groq-sdk"; const groq = new Groq(); // Route to cheapest model that meets quality requirements interface ModelConfig { model: string; inputCostPer1M: number; outputCostPer1M: number; } const ROUTING: Record<string, ModelConfig> = { classification: { model: "llama-3.1-8b-instant", inputCostPer1M: 0.05, outputCostPer1M: 0.08 }, extraction: { model: "llama-3.1-8b-instant", inputCostPer1M: 0.05, outputCostPer1M: 0.08 }, summarization: { model: "llama-3.1-8b-instant", inputCostPer1M: 0.05, outputCostPer1M: 0.08 }, reasoning: { model: "llama-3.3-70b-versatile", inputCostPer1M: 0.59, outputCostPer1M: 0.79 }, codeReview: { model: "llama-3.3-70b-versatile", inputCostPer1M: 0.59, outputCostPer1M: 0.79 }, ...

Details

Author: jeremylongshore
Repository: jeremylongshore/claude-code-plugins-plus-skills
Created: 7 months ago
Last Updated: today
Language: Python
License: MIT

perplexity-cost-tuning

Optimize Perplexity costs through model routing, caching, token limits, and budget monitoring. Use when analyzing Perplexity billing, reducing API costs, or implementing budget alerts for Perplexity Sonar API. Trigger with phrases like "perplexity cost", "perplexity billing", "reduce perplexity costs", "perplexity pricing", "perplexity budget".

2,266 Updated today

jeremylongshore