vastai-performance-tuning

Solid

Optimize Vast.ai GPU instance selection, startup time, and training throughput. Use when optimizing instance selection, reducing startup latency, or maximizing GPU utilization on rented hardware. Trigger with phrases like "vastai performance", "optimize vastai", "vastai slow", "vastai gpu utilization", "vastai throughput".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Vast.ai Performance Tuning ## Overview Optimize GPU instance selection, startup time, and training throughput on Vast.ai. Key levers: Docker image caching, GPU selection by dlperf score, data pipeline optimization, and multi-GPU scaling. ## Prerequisites - Vast.ai account with active or planned instances - Understanding of GPU compute bottlenecks - Profiling tools (nvidia-smi, torch.profiler) ## Instructions ### Step 1: Optimize Instance Selection by Performance ```bash # Sort by dlperf (deep learning performance benchmark) instead of price vastai search offers 'num_gpus=1 gpu_ram>=24 reliability>0.95' \ --order 'dlperf-' --limit 10 # The dlperf field measures actual GPU compute throughput # Higher dlperf = faster training even at same GPU model # Variance within same GPU model can be 20-30% ``` ```python def select_by_performance_per_dollar(offers): """Select the offer with best performance per dollar.""" for o in offers: o["perf_per_dollar"] = o.get("dlperf", 0) / max(o["dph_total"], 0.01) return max(offers, key=lambda o: o["perf_per_dollar"]) ``` ### Step 2: Reduce Instance Startup Time ```bash # Use smaller, pre-cached Docker images # FAST: nvidia/cuda:12.1.1-runtime-ubuntu22.04 (~2GB, widely cached) # MEDIUM: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime (~4GB) # SLOW: custom-image:latest with pip install at build (~10GB+) # Pre-install deps in the image, not in onstart # BAD (slow startup): vastai create instance $ID --image pytorch/pyt...

Details

Author: jeremylongshore
Repository: jeremylongshore/claude-code-plugins-plus-skills
Created: 7 months ago
Last Updated: today
Language: Python
License: MIT

vastai-core-workflow-b

Execute Vast.ai secondary workflow: multi-instance orchestration, spot recovery, and cost optimization. Use when running distributed training, handling spot preemption, or optimizing GPU spend across multiple instances. Trigger with phrases like "vastai distributed training", "vastai spot recovery", "vastai multi-gpu", "vastai cost optimization".

2,266 Updated today

jeremylongshore