clade-performance-tuning

Featured

Optimize Anthropic API latency — streaming, prompt caching, model selection, Use when working with performance-tuning patterns. connection reuse, and parallel requests. Trigger with "anthropic slow", "claude latency", "speed up anthropic", "anthropic performance", "claude response time".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Anthropic Performance Tuning ## Overview Claude latency has two components: **time to first token (TTFT)** and **tokens per second (TPS)**. Different strategies target each. ## Latency Benchmarks (approximate) | Model | TTFT (p50) | TTFT (p95) | Output TPS | |-------|-----------|-----------|------------| | Claude Haiku 4.5 | 200ms | 600ms | ~150 | | Claude Sonnet 4 | 400ms | 1.2s | ~90 | | Claude Opus 4 | 800ms | 2.5s | ~40 | ## Optimization Strategies ## Instructions ### Step 1: Always Stream ```typescript // Streaming delivers the first token ASAP — user sees response instantly // instead of waiting for the full response to generate const stream = client.messages.stream({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, messages, }); // First token arrives in ~400ms (Sonnet) // Full response may take 5-10s, but user sees progress immediately for await (const event of stream) { if (event.type === 'content_block_delta') { yield event.delta.text; } } ``` ### Step 2: Prompt Caching — Faster TTFT ```typescript // Cached prompts skip re-processing — dramatically lower TTFT for large system prompts const message = await client.messages.create({ model: 'claude-sonnet-4-20250514', max_tokens: 1024, system: [{ type: 'text', text: largeSystemPrompt, // 10K+ tokens cache_control: { type: 'ephemeral' }, }], messages, }, { headers: { 'claude-beta': 'prompt-caching-2024-07-31' }, }); // TTFT drops from ~2s to ~500ms on cache hit with la...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Featured

anth-performance-tuning

Optimize Claude API performance with prompt caching, model selection, streaming, and latency reduction techniques. Use when experiencing slow responses, optimizing token usage, or reducing time-to-first-token in production. Trigger with phrases like "anthropic performance", "claude speed", "optimize claude latency", "anthropic caching", "faster claude responses".

2,266 Updated today
jeremylongshore
AI & Automation Featured

clade-cost-tuning

Optimize Anthropic API costs — model selection, prompt caching, batches, Use when working with cost-tuning patterns. token reduction, and usage monitoring. Trigger with "anthropic pricing", "claude cost", "reduce anthropic spend", "anthropic billing", "claude cheaper".

2,266 Updated today
jeremylongshore
AI & Automation Featured

anth-cost-tuning

Optimize Anthropic Claude API costs with model routing, prompt caching, batching, and spend monitoring. Use when analyzing Claude API billing, reducing costs, or implementing cost controls and budget alerts. Trigger with phrases like "anthropic cost", "claude billing", "reduce claude spend", "anthropic budget", "claude pricing optimize".

2,266 Updated today
jeremylongshore
AI & Automation Featured

cohere-performance-tuning

Optimize Cohere API performance with caching, batching, model selection, and streaming. Use when experiencing slow API responses, implementing caching strategies, or optimizing request throughput for Cohere Chat, Embed, and Rerank. Trigger with phrases like "cohere performance", "optimize cohere", "cohere latency", "cohere caching", "cohere slow", "cohere batch".

2,266 Updated today
jeremylongshore
AI & Automation Listed

claude-api

Anthropic Claude API patterns for Python and TypeScript. Covers Messages API, streaming, tool use, vision, extended thinking, batches, prompt caching, and Claude Agent SDK. Use when building applications with the Claude API or Anthropic SDKs.

0 Updated today
CodeWithBehnam