coreweave-cost-tuning

Solid

Optimize CoreWeave GPU cloud costs with right-sizing and scheduling. Use when reducing GPU spend, selecting cost-effective instances, or implementing scale-to-zero for dev workloads. Trigger with phrases like "coreweave cost", "coreweave pricing", "reduce coreweave spend", "coreweave budget".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 97/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
73
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# CoreWeave Cost Tuning ## GPU Pricing Reference (approximate) | GPU | Per GPU/hour | Best For | |-----|-------------|----------| | A100 40GB PCIe | ~$1.50 | Development, smaller models | | A100 80GB PCIe | ~$2.21 | Production inference | | H100 80GB PCIe | ~$4.76 | High-throughput inference | | H100 SXM5 (8x) | ~$6.15/GPU | Training, multi-GPU | | L40 | ~$1.10 | Image generation, light inference | ## Cost Optimization Strategies ### Scale-to-Zero for Dev/Staging ```yaml autoscaling.knative.dev/minScale: "0" autoscaling.knative.dev/scaleDownDelay: "5m" ``` ### Right-Size GPU Selection ```python def recommend_gpu(model_size_b: float, inference_only: bool = True) -> str: if model_size_b <= 7: return "L40" if inference_only else "A100_PCIE_80GB" elif model_size_b <= 13: return "A100_PCIE_80GB" elif model_size_b <= 70: return "A100_PCIE_80GB (4x tensor parallel)" else: return "H100_SXM5 (8x tensor parallel)" ``` ### Quantization to Use Smaller GPUs Use AWQ or GPTQ quantization to fit larger models on smaller GPUs: ```bash # 70B model at 4-bit fits on single A100-80GB instead of 4x vllm serve meta-llama/Llama-3.1-70B-Instruct-AWQ --quantization awq ``` ## Resources - [CoreWeave Pricing](https://www.coreweave.com/pricing) - [CoreWeave GPU Instances](https://docs.coreweave.com/docs/platform/instances/gpu-instances) ## Next Steps For architecture patterns, see `coreweave-reference-architecture`.

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

coreweave-performance-tuning

Optimize CoreWeave GPU inference latency and throughput. Use when reducing inference latency, maximizing GPU utilization, or tuning batch sizes and concurrency. Trigger with phrases like "coreweave performance", "coreweave latency", "coreweave throughput", "optimize coreweave inference".

2,266 Updated today
jeremylongshore
AI & Automation Solid

coreweave-migration-deep-dive

Migrate ML workloads from AWS/GCP/Azure to CoreWeave GPU cloud. Use when moving inference services from hyperscaler GPU instances, migrating training pipelines, or evaluating CoreWeave vs cloud GPU costs. Trigger with phrases like "migrate to coreweave", "coreweave migration", "move from aws to coreweave", "coreweave vs aws gpu".

2,266 Updated today
jeremylongshore
AI & Automation Solid

coreweave-rate-limits

Handle CoreWeave API and GPU quota limits. Use when hitting quota limits, managing GPU resource allocation, or implementing request queuing for inference endpoints. Trigger with phrases like "coreweave quota", "coreweave limits", "coreweave gpu allocation", "coreweave throttle".

2,266 Updated today
jeremylongshore
AI & Automation Featured

coreweave-sdk-patterns

Production-ready patterns for CoreWeave GPU workload management with kubectl and Python. Use when building inference clients, managing GPU deployments programmatically, or creating reusable CoreWeave deployment templates. Trigger with phrases like "coreweave patterns", "coreweave client", "coreweave Python", "coreweave deployment template".

2,266 Updated today
jeremylongshore
AI & Automation Solid

coreweave-common-errors

Diagnose and fix CoreWeave GPU scheduling, pod, and networking errors. Use when pods are stuck Pending, GPUs are not allocated, or experiencing CUDA and NCCL errors. Trigger with phrases like "coreweave error", "coreweave pod pending", "coreweave gpu not found", "coreweave debug", "fix coreweave".

2,266 Updated today
jeremylongshore