vastai-observability

Solid

Monitor Vast.ai GPU instance health, utilization, and costs. Use when setting up monitoring dashboards, configuring alerts, or tracking GPU utilization and spending. Trigger with phrases like "vastai monitoring", "vastai metrics", "vastai observability", "monitor vastai", "vastai alerts".

AI & Automation 2,266 stars 315 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Vast.ai Observability ## Overview Monitor Vast.ai GPU instance health, utilization, and costs. Key metrics: GPU utilization (idle GPUs waste $0.20-$4.00/hr), instance uptime, training progress, cost accumulation, and spot preemption events. ## Prerequisites - Vast.ai account with active instances - `vastai` CLI installed - Optional: Prometheus, Grafana, or Datadog for dashboarding ## Instructions ### Step 1: Instance Metrics Collector ```python import subprocess, json, time from datetime import datetime class VastMetricsCollector: def __init__(self, output_file="vast_metrics.jsonl"): self.output_file = output_file def collect(self): result = subprocess.run( ["vastai", "show", "instances", "--raw"], capture_output=True, text=True) instances = json.loads(result.stdout) metrics = { "timestamp": datetime.utcnow().isoformat(), "total_instances": len(instances), "running": 0, "total_hourly_cost": 0, "instances": [], } for inst in instances: status = inst.get("actual_status", "unknown") dph = inst.get("dph_total", 0) if status == "running": metrics["running"] += 1 metrics["total_hourly_cost"] += dph metrics["instances"].append({ "id": inst["id"], "gpu": inst.get("gpu_name"), "status": status, "dph": dph,...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

vastai

Vast.ai CLI to manage GPU instances, volumes, serverless endpoints, and billing.

194 Updated yesterday
vast-ai
AI & Automation Solid

vast-gpu

Rent, manage, and destroy GPU instances on vast.ai. Use when user says "rent gpu", "vast.ai", "rent a server", "cloud gpu", or needs on-demand GPU without owning hardware.

11,051 Updated today
wanshuiyin
AI & Automation Solid

vastai-cost-tuning

Optimize Vast.ai GPU cloud costs through smart instance selection and lifecycle management. Use when analyzing GPU spending, reducing training costs, or implementing budget controls for Vast.ai workloads. Trigger with phrases like "vastai cost", "vastai billing", "reduce vastai costs", "vastai pricing", "vastai budget".

2,266 Updated today
jeremylongshore
AI & Automation Solid

vastai-migration-deep-dive

Migrate GPU workloads to or from Vast.ai, or between GPU providers. Use when switching from AWS/GCP/Azure GPU instances to Vast.ai, migrating between GPU types, or re-platforming ML infrastructure. Trigger with phrases like "migrate to vastai", "vastai migration", "switch to vastai", "vastai from aws", "vastai from lambda".

2,266 Updated today
jeremylongshore
AI & Automation Solid

vastai-webhooks-events

Build event-driven workflows around Vast.ai instance lifecycle events. Use when monitoring instance status changes, implementing auto-recovery, or building event-driven GPU orchestration. Trigger with phrases like "vastai events", "vastai instance monitoring", "vastai status changes", "vastai lifecycle events".

2,266 Updated today
jeremylongshore