mistral-performance-tuning

Featured

Optimize Mistral AI performance with caching, batching, and latency reduction. Use when experiencing slow API responses, implementing caching strategies, or optimizing request throughput for Mistral AI integrations. Trigger with phrases like "mistral performance", "optimize mistral", "mistral latency", "mistral caching", "mistral slow".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Mistral AI Performance Tuning ## Overview Optimize Mistral AI API response times and throughput. Key levers: model selection (Mistral Small ~200ms TTFT vs Large ~500ms), prompt length (fewer tokens = faster), streaming (perceived speed), caching (zero-latency repeats), and concurrent request management. ## Prerequisites - Mistral API integration in production - Understanding of RPM/TPM limits for your tier - Application architecture supporting streaming ## Instructions ### Step 1: Model Selection by Latency Budget ```typescript const MODELS_BY_USE_CASE: Record<string, { model: string; ttftMs: string; note: string }> = { realtime_chat: { model: 'mistral-small-latest', ttftMs: '~200ms', note: '256k ctx, cheapest' }, code_completion: { model: 'codestral-latest', ttftMs: '~150ms', note: 'Optimized for code + FIM' }, code_agents: { model: 'devstral-latest', ttftMs: '~300ms', note: 'Agentic coding tasks' }, reasoning: { model: 'mistral-large-latest', ttftMs: '~500ms', note: '256k ctx, strongest' }, vision: { model: 'pixtral-large-latest', ttftMs: '~600ms', note: 'Image + text multimodal' }, embeddings: { model: 'mistral-embed', ttftMs: '~50ms', note: '1024-dim, batch-friendly' }, edge_devices: { model: 'ministral-latest', ttftMs: '~100ms', note: '3B-14B, fastest' }, }; ``` ### Step 2: Streaming for User-Facing Responses Streaming reduces perceived latency from 1-2s (full response) t...

Details

Author: jeremylongshore
Repository: jeremylongshore/claude-code-plugins-plus-skills
Created: 7 months ago
Last Updated: today
Language: Python
License: MIT

mistral-core-workflow-a

Execute Mistral AI chat completions with streaming, multi-turn, and guardrails. Use when implementing chat interfaces, building conversational AI, or integrating Mistral for text generation. Trigger with phrases like "mistral chat", "mistral completion", "mistral streaming", "mistral conversation", "mistral guardrails".

2,266 Updated today

jeremylongshore