together-reference-architecture

Solid

Together AI reference architecture for inference, fine-tuning, and model deployment. Use when working with Together AI's OpenAI-compatible API. Trigger: "together reference architecture".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Together AI Reference Architecture ## Overview Production architecture for AI inference, fine-tuning, and batch processing with Together AI's OpenAI-compatible API. Designed for teams routing requests across 100+ open-source models (Llama, Mixtral, Qwen, FLUX) with intelligent model selection, response caching, fine-tune pipeline management, and cost optimization via batch inference at 50% discount. Key design drivers: model routing for cost/quality tradeoffs, inference caching for repeated queries, fine-tune lifecycle management, and graceful degradation across model providers. ## Architecture Diagram ``` Application ──→ Model Router ──→ Cache (Redis) ──→ Together API (v1) ↓ /chat/completions Queue (Bull) ──→ Batch Worker /completions ↓ /images/generations Fine-Tune Manager ──→ Together API /fine-tunes ↓ /models Cost Tracker ──→ Analytics Dashboard ``` ## Service Layer ```typescript class InferenceService { constructor(private together: TogetherClient, private cache: CacheLayer, private router: ModelRouter) {} async complete(request: InferenceRequest): Promise<InferenceResponse> { const model = this.router.selectModel(request.task, request.priority); const cacheKey = `inference:${model}:${this.hashPrompt(request.prompt)}`; const cached = await thi...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category