together-rate-limits

Solid

Together AI rate limits for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together rate limits".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Together AI Rate Limits ## Overview Together AI's OpenAI-compatible inference API enforces per-key rate limits that vary by model tier and operation type. Chat completions and embeddings share a global request quota, while fine-tuning jobs and batch inference have separate concurrency caps. High-throughput workloads like embedding entire document corpora or running evaluations across 100+ prompts require client-side token bucket limiting. Together's batch inference endpoint offers 50% cost savings but has its own queue depth limits that differ from real-time inference. ## Rate Limit Reference | Endpoint | Limit | Window | Scope | |----------|-------|--------|-------| | Chat completions | 600 req | 1 minute | Per API key | | Embeddings | 300 req | 1 minute | Per API key | | Image generation (FLUX) | 60 req | 1 minute | Per API key | | Fine-tune jobs (concurrent) | 3 jobs | Rolling | Per API key | | Batch inference | 100 req/batch, 10 batches | Rolling | Per API key | ## Rate Limiter Implementation ```typescript class TogetherRateLimiter { private tokens: number; private lastRefill: number; private readonly max: number; private readonly refillRate: number; private queue: Array<{ resolve: () => void }> = []; constructor(maxPerMinute: number) { this.max = maxPerMinute; this.tokens = maxPerMinute; this.lastRefill = Date.now(); this.refillRate = maxPerMinute / 60_000; } async acquire(): Promise<void> { this.refill(); if (this.tokens...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category