coreweave-performance-tuning

Solid

Optimize CoreWeave GPU inference latency and throughput. Use when reducing inference latency, maximizing GPU utilization, or tuning batch sizes and concurrency. Trigger with phrases like "coreweave performance", "coreweave latency", "coreweave throughput", "optimize coreweave inference".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 97/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
83
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# CoreWeave Performance Tuning ## GPU Selection by Workload | Workload | Recommended GPU | Why | |----------|----------------|-----| | LLM inference (7-13B) | A100 80GB | Good balance of memory and cost | | LLM inference (70B+) | 8xH100 | NVLink for tensor parallelism | | Image generation | L40 | Good for diffusion models | | Training (large models) | 8xH100 SXM5 | Fastest interconnect | | Batch processing | A100 40GB | Cost-effective | ## Inference Optimization ```yaml # Continuous batching with vLLM containers: - name: vllm args: - "--model=meta-llama/Llama-3.1-8B-Instruct" - "--max-num-batched-tokens=8192" - "--max-num-seqs=256" - "--gpu-memory-utilization=0.90" - "--enable-prefix-caching" - "--dtype=float16" ``` ## Autoscaling Tuning ```yaml # HPA based on GPU utilization apiVersion: autoscaling/v2 kind: HorizontalPodAutoscaler metadata: name: inference-hpa spec: scaleTargetRef: apiVersion: apps/v1 kind: Deployment name: inference-server minReplicas: 2 maxReplicas: 10 metrics: - type: Pods pods: metric: name: DCGM_FI_DEV_GPU_UTIL target: type: AverageValue averageValue: "70" ``` ## Performance Benchmarks | Metric | A100-80GB | H100-80GB | |--------|-----------|-----------| | Llama-8B tokens/sec | ~2,000 | ~4,500 | | Llama-70B tokens/sec | ~200 (4x) | ~500 (4x) | | Cold start (vLLM) | 30-60s | 20-40s | ## Resources - [CoreWeave Inference](https:...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

coreweave-cost-tuning

Optimize CoreWeave GPU cloud costs with right-sizing and scheduling. Use when reducing GPU spend, selecting cost-effective instances, or implementing scale-to-zero for dev workloads. Trigger with phrases like "coreweave cost", "coreweave pricing", "reduce coreweave spend", "coreweave budget".

2,266 Updated today
jeremylongshore
AI & Automation Solid

coreweave-migration-deep-dive

Migrate ML workloads from AWS/GCP/Azure to CoreWeave GPU cloud. Use when moving inference services from hyperscaler GPU instances, migrating training pipelines, or evaluating CoreWeave vs cloud GPU costs. Trigger with phrases like "migrate to coreweave", "coreweave migration", "move from aws to coreweave", "coreweave vs aws gpu".

2,266 Updated today
jeremylongshore
AI & Automation Featured

coreweave-deploy-integration

Deploy inference services on CoreWeave with Helm charts and Kustomize. Use when deploying multi-model inference, managing GPU deployments at scale, or templating CoreWeave manifests. Trigger with phrases like "deploy coreweave", "coreweave helm", "coreweave kustomize", "coreweave deployment patterns".

2,266 Updated today
jeremylongshore
AI & Automation Featured

coreweave-hello-world

Deploy a GPU workload on CoreWeave with kubectl. Use when running your first GPU job, testing inference, or verifying CoreWeave cluster access. Trigger with phrases like "coreweave hello world", "coreweave first deploy", "coreweave gpu test", "run on coreweave".

2,266 Updated today
jeremylongshore
AI & Automation Featured

coreweave-observability

Set up GPU monitoring and observability for CoreWeave workloads. Use when implementing GPU metrics dashboards, configuring alerts, or tracking inference latency and throughput. Trigger with phrases like "coreweave monitoring", "coreweave observability", "coreweave gpu metrics", "coreweave grafana".

2,266 Updated today
jeremylongshore