vastai-reference-architecture

Featured

Implement Vast.ai reference architecture for GPU compute workflows. Use when designing ML training pipelines, structuring GPU orchestration, or establishing architecture patterns for Vast.ai applications. Trigger with phrases like "vastai architecture", "vastai design pattern", "vastai project structure", "vastai ml pipeline".

AI & Automation 2,359 stars 334 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Vast.ai Reference Architecture ## Overview Production architecture for GPU compute workflows on Vast.ai. Covers the three-tier pattern (orchestrator, GPU workers, artifact storage), job queue design, and fault-tolerant training pipelines. ## Prerequisites - Vast.ai account with CLI - Cloud storage (S3, GCS, or MinIO) for artifacts - Understanding of ML training pipelines ## Instructions ### Architecture: Three-Tier GPU Compute ``` ┌─────────────────────────────────────────────────┐ │ ORCHESTRATOR (your server / CI / cloud function) │ │ - Job queue management │ │ - Instance provisioning via Vast.ai API │ │ - Status monitoring and auto-recovery │ │ - Cost tracking and budget enforcement │ └───────────────┬─────────────────────────────────┘ │ Vast.ai REST API ┌───────────────▼─────────────────────────────────┐ │ GPU WORKERS (Vast.ai rented instances) │ │ - Training / inference execution │ │ - Checkpoint saving to cloud storage │ │ - Health reporting back to orchestrator │ │ - Graceful shutdown on SIGTERM (spot preemption)│ └───────────────┬─────────────────────────────────┘ │ S3 / GCS / MinIO ┌───────────────▼─────────────────────────────────┐ │ ARTIFACT STORAGE (persistent) │ │ - Model checkpoints │ │ - Training logs and metrics │ │ - Dataset cache ...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
8 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category