together-rate-limits

Solid

Together AI rate limits for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together rate limits".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Together AI Rate Limits ## Overview Together AI's OpenAI-compatible inference API enforces per-key rate limits that vary by model tier and operation type. Chat completions and embeddings share a global request quota, while fine-tuning jobs and batch inference have separate concurrency caps. High-throughput workloads like embedding entire document corpora or running evaluations across 100+ prompts require client-side token bucket limiting. Together's batch inference endpoint offers 50% cost savings but has its own queue depth limits that differ from real-time inference. ## Rate Limit Reference | Endpoint | Limit | Window | Scope | |----------|-------|--------|-------| | Chat completions | 600 req | 1 minute | Per API key | | Embeddings | 300 req | 1 minute | Per API key | | Image generation (FLUX) | 60 req | 1 minute | Per API key | | Fine-tune jobs (concurrent) | 3 jobs | Rolling | Per API key | | Batch inference | 100 req/batch, 10 batches | Rolling | Per API key | ## Rate Limiter Implementation ```typescript class TogetherRateLimiter { private tokens: number; private lastRefill: number; private readonly max: number; private readonly refillRate: number; private queue: Array<{ resolve: () => void }> = []; constructor(maxPerMinute: number) { this.max = maxPerMinute; this.tokens = maxPerMinute; this.lastRefill = Date.now(); this.refillRate = maxPerMinute / 60_000; } async acquire(): Promise<void> { this.refill(); if (this.tokens...

Details

Author: jeremylongshore
Repository: jeremylongshore/claude-code-plugins-plus-skills
Created: 7 months ago
Last Updated: today
Language: Python
License: MIT

juicebox-rate-limits

Implement Juicebox rate limiting. Trigger: "juicebox rate limit", "juicebox 429", "juicebox throttle".

2,266 Updated today

jeremylongshore