together-cost-tuning

Solid

Together AI cost tuning for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together cost tuning".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 97/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
97
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Together AI Cost Tuning ## Overview Optimize Together AI costs with model selection, batching, and caching. ## Instructions ### Together AI Pricing Model | Model Category | Price (per 1M tokens) | Example Models | |---------------|----------------------|----------------| | Small (< 10B) | $0.10-0.30 | Llama-3.2-3B, Qwen-2.5-7B | | Medium (10-40B) | $0.60-1.20 | Mixtral-8x7B, Llama-3.3-70B-Turbo | | Large (40B+) | $2.00-5.00 | Llama-3.1-405B, DeepSeek-V3 | | Image gen | $0.003-0.05/image | FLUX.1-schnell, SDXL | | Embeddings | $0.008/1M tokens | M2-BERT | | Fine-tuning | ~$5-25/hour | Depends on model + GPU | | Batch inference | 50% off | Same models, async | ### Cost Reduction Strategies ```python # 1. Use Turbo variants (faster, cheaper, similar quality) # meta-llama/Llama-3.3-70B-Instruct-Turbo vs Llama-3.1-70B-Instruct # 2. Batch inference (50% cost reduction) batch_response = client.batch.create( input_file_id=file_id, model="meta-llama/Llama-3.3-70B-Instruct-Turbo", completion_window="24h", ) # 3. Cache responses for identical prompts from functools import lru_cache @lru_cache(maxsize=1000) def cached_completion(prompt: str, model: str) -> str: response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], ) return response.choices[0].message.content # 4. Use smallest model that works # Test with 3B first, upgrade to 70B only if quality insufficient ``` ## Error Handling | Issue | Cau...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category