openrouter-performance-tuning

Featured

Optimize OpenRouter request latency and throughput. Use when building real-time applications, reducing TTFT, or scaling request volume. Triggers: 'openrouter performance', 'openrouter latency', 'openrouter speed', 'optimize openrouter throughput'.

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# OpenRouter Performance Tuning ## Overview OpenRouter adds minimal overhead (~50-100ms) to direct provider calls. Most latency comes from the upstream model. Key levers: model selection (smaller = faster), streaming (lower TTFT), parallel requests, prompt size reduction, and provider routing to faster infrastructure. This skill covers benchmarking, streaming optimization, concurrent processing, and connection tuning. ## Benchmark Latency ```python import os, time, statistics from openai import OpenAI client = OpenAI( base_url="https://openrouter.ai/api/v1", api_key=os.environ["OPENROUTER_API_KEY"], default_headers={"HTTP-Referer": "https://my-app.com", "X-Title": "my-app"}, ) def benchmark_model(model: str, prompt: str = "Say hello", n: int = 5) -> dict: """Benchmark a model's latency over N requests.""" latencies = [] for _ in range(n): start = time.monotonic() response = client.chat.completions.create( model=model, messages=[{"role": "user", "content": prompt}], max_tokens=50, ) latencies.append((time.monotonic() - start) * 1000) return { "model": model, "p50_ms": round(statistics.median(latencies)), "p95_ms": round(sorted(latencies)[int(len(latencies) * 0.95)]), "avg_ms": round(statistics.mean(latencies)), "min_ms": round(min(latencies)), "max_ms": round(max(latencies)), } # Compare fast vs slow models for model in ["o...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

openrouter-model-routing

Implement intelligent model routing to optimize cost, quality, and latency on OpenRouter. Use when building multi-model systems or optimizing spend across task types. Triggers: 'openrouter routing', 'model routing', 'route to model', 'model selection openrouter'.

2,266 Updated today
jeremylongshore
AI & Automation Featured

openrouter-pricing-basics

Understand OpenRouter pricing, calculate costs, and optimize spend. Use when budgeting, comparing model costs, or tracking spend. Triggers: 'openrouter pricing', 'openrouter cost', 'model pricing', 'openrouter budget', 'how much does openrouter cost'.

2,266 Updated today
jeremylongshore
AI & Automation Featured

openrouter-caching-strategy

Implement caching for OpenRouter API responses to reduce cost and latency. Use when optimizing repeat queries, building RAG systems, or reducing API spend. Triggers: 'openrouter cache', 'cache llm responses', 'openrouter caching', 'reduce openrouter cost'.

2,266 Updated today
jeremylongshore
AI & Automation Featured

openrouter-reference-architecture

Design production architectures using OpenRouter as the LLM gateway. Use when planning system design, reviewing architecture, or scaling AI applications. Triggers: 'openrouter architecture', 'openrouter system design', 'openrouter at scale', 'llm gateway architecture'.

2,266 Updated today
jeremylongshore
AI & Automation Featured

openrouter-multi-provider

Use multiple AI providers (OpenAI, Anthropic, Google, Meta) through OpenRouter's unified API. Use when comparing providers, building cross-provider workflows, or maximizing availability. Triggers: 'openrouter providers', 'multi provider', 'openrouter openai anthropic', 'compare models openrouter'.

2,266 Updated today
jeremylongshore