mistral-performance-tuning

Featured

Optimize Mistral AI performance with caching, batching, and latency reduction. Use when experiencing slow API responses, implementing caching strategies, or optimizing request throughput for Mistral AI integrations. Trigger with phrases like "mistral performance", "optimize mistral", "mistral latency", "mistral caching", "mistral slow".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Mistral AI Performance Tuning ## Overview Optimize Mistral AI API response times and throughput. Key levers: model selection (Mistral Small ~200ms TTFT vs Large ~500ms), prompt length (fewer tokens = faster), streaming (perceived speed), caching (zero-latency repeats), and concurrent request management. ## Prerequisites - Mistral API integration in production - Understanding of RPM/TPM limits for your tier - Application architecture supporting streaming ## Instructions ### Step 1: Model Selection by Latency Budget ```typescript const MODELS_BY_USE_CASE: Record<string, { model: string; ttftMs: string; note: string }> = { realtime_chat: { model: 'mistral-small-latest', ttftMs: '~200ms', note: '256k ctx, cheapest' }, code_completion: { model: 'codestral-latest', ttftMs: '~150ms', note: 'Optimized for code + FIM' }, code_agents: { model: 'devstral-latest', ttftMs: '~300ms', note: 'Agentic coding tasks' }, reasoning: { model: 'mistral-large-latest', ttftMs: '~500ms', note: '256k ctx, strongest' }, vision: { model: 'pixtral-large-latest', ttftMs: '~600ms', note: 'Image + text multimodal' }, embeddings: { model: 'mistral-embed', ttftMs: '~50ms', note: '1024-dim, batch-friendly' }, edge_devices: { model: 'ministral-latest', ttftMs: '~100ms', note: '3B-14B, fastest' }, }; ``` ### Step 2: Streaming for User-Facing Responses Streaming reduces perceived latency from 1-2s (full response) t...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

mistral-cost-tuning

Optimize Mistral AI costs through model selection, token management, and usage monitoring. Use when analyzing Mistral billing, reducing API costs, or implementing usage monitoring and budget alerts. Trigger with phrases like "mistral cost", "mistral billing", "reduce mistral costs", "mistral pricing", "mistral budget".

2,266 Updated today
jeremylongshore
AI & Automation Featured

mistral-observability

Set up comprehensive observability for Mistral AI with metrics, traces, and alerts. Use when implementing monitoring for Mistral AI operations, setting up dashboards, or configuring alerting for integration health. Trigger with phrases like "mistral monitoring", "mistral metrics", "mistral observability", "monitor mistral", "mistral alerts".

2,266 Updated today
jeremylongshore
AI & Automation Featured

mistral-common-errors

Diagnose and fix Mistral AI common errors and exceptions. Use when encountering Mistral errors, debugging failed requests, or troubleshooting integration issues. Trigger with phrases like "mistral error", "fix mistral", "mistral not working", "debug mistral".

2,266 Updated today
jeremylongshore
AI & Automation Featured

mistral-rate-limits

Implement Mistral AI rate limiting, backoff, and request management. Use when handling rate limit errors, implementing retry logic, or optimizing API request throughput for Mistral AI. Trigger with phrases like "mistral rate limit", "mistral throttling", "mistral 429", "mistral retry", "mistral backoff".

2,266 Updated today
jeremylongshore
AI & Automation Featured

mistral-core-workflow-a

Execute Mistral AI chat completions with streaming, multi-turn, and guardrails. Use when implementing chat interfaces, building conversational AI, or integrating Mistral for text generation. Trigger with phrases like "mistral chat", "mistral completion", "mistral streaming", "mistral conversation", "mistral guardrails".

2,266 Updated today
jeremylongshore