vastai-performance-tuning

Solid

Optimize Vast.ai GPU instance selection, startup time, and training throughput. Use when optimizing instance selection, reducing startup latency, or maximizing GPU utilization on rented hardware. Trigger with phrases like "vastai performance", "optimize vastai", "vastai slow", "vastai gpu utilization", "vastai throughput".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Vast.ai Performance Tuning ## Overview Optimize GPU instance selection, startup time, and training throughput on Vast.ai. Key levers: Docker image caching, GPU selection by dlperf score, data pipeline optimization, and multi-GPU scaling. ## Prerequisites - Vast.ai account with active or planned instances - Understanding of GPU compute bottlenecks - Profiling tools (nvidia-smi, torch.profiler) ## Instructions ### Step 1: Optimize Instance Selection by Performance ```bash # Sort by dlperf (deep learning performance benchmark) instead of price vastai search offers 'num_gpus=1 gpu_ram>=24 reliability>0.95' \ --order 'dlperf-' --limit 10 # The dlperf field measures actual GPU compute throughput # Higher dlperf = faster training even at same GPU model # Variance within same GPU model can be 20-30% ``` ```python def select_by_performance_per_dollar(offers): """Select the offer with best performance per dollar.""" for o in offers: o["perf_per_dollar"] = o.get("dlperf", 0) / max(o["dph_total"], 0.01) return max(offers, key=lambda o: o["perf_per_dollar"]) ``` ### Step 2: Reduce Instance Startup Time ```bash # Use smaller, pre-cached Docker images # FAST: nvidia/cuda:12.1.1-runtime-ubuntu22.04 (~2GB, widely cached) # MEDIUM: pytorch/pytorch:2.2.0-cuda12.1-cudnn8-runtime (~4GB) # SLOW: custom-image:latest with pip install at build (~10GB+) # Pre-install deps in the image, not in onstart # BAD (slow startup): vastai create instance $ID --image pytorch/pyt...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

vastai-cost-tuning

Optimize Vast.ai GPU cloud costs through smart instance selection and lifecycle management. Use when analyzing GPU spending, reducing training costs, or implementing budget controls for Vast.ai workloads. Trigger with phrases like "vastai cost", "vastai billing", "reduce vastai costs", "vastai pricing", "vastai budget".

2,266 Updated today
jeremylongshore
AI & Automation Solid

vastai-core-workflow-a

Execute Vast.ai primary workflow: GPU instance provisioning and job execution. Use when renting GPUs for training, searching offers by price and specs, or managing the full instance lifecycle from search to teardown. Trigger with phrases like "vastai rent gpu", "vastai training job", "vastai provision instance", "run job on vastai".

2,266 Updated today
jeremylongshore
AI & Automation Solid

vastai-deploy-integration

Deploy ML training jobs and inference services on Vast.ai GPU cloud. Use when deploying GPU workloads, configuring Docker images, or setting up automated deployment scripts. Trigger with phrases like "deploy vastai", "vastai deployment", "vastai docker", "vastai production deploy".

2,266 Updated today
jeremylongshore
AI & Automation Solid

vastai-hello-world

Rent your first GPU instance on Vast.ai and run a workload. Use when starting a new Vast.ai integration, testing your setup, or learning basic Vast.ai GPU rental patterns. Trigger with phrases like "vastai hello world", "vastai example", "vastai quick start", "rent first gpu", "vastai first instance".

2,266 Updated today
jeremylongshore
AI & Automation Solid

vastai-core-workflow-b

Execute Vast.ai secondary workflow: multi-instance orchestration, spot recovery, and cost optimization. Use when running distributed training, handling spot preemption, or optimizing GPU spend across multiple instances. Trigger with phrases like "vastai distributed training", "vastai spot recovery", "vastai multi-gpu", "vastai cost optimization".

2,266 Updated today
jeremylongshore