api-vs-selfhost-skilllisted

Decide API-vs-self-host LLM economics and fine-tuning ROI from any user context (code, PRDs, traffic logs, billing screenshots). Fetches live GPU prices from Runpod/Lambda/Modal, API prices from models.dev or vendor pages, and quality rank from lmarena.ai, then calls a deterministic local Python script for VRAM, billed-hours, and capex math. Use when the user asks "should I self-host", "API vs self-host", "fine-tune cost", "fine-tuning ROI", "what GPU do I need for <model>", "OpenAI bill too high", or pastes a billing screenshot / PRD comparing closed APIs to open-weight models.
artvandelay/api-vs-selfhost-skill · ★ 1 · AI & Automation · score 72

Install: claude install-skill artvandelay/api-vs-selfhost-skill

# API vs Self-Host Decide API-vs-self-host LLM economics from whatever context the user gives you. Fetch live prices, run `scripts/calc.py` for math, write a short report. ## Trigger - "should I self-host" / "API vs self-host" / "cost to self-host" - "fine-tune cost" / "fine-tuning ROI" - "what GPU do I need for \<model\>" - "OpenAI/Anthropic bill too high" / "is open-source cheaper than \<API\>" - User pastes a billing screenshot, PRD, or break-even question Out of scope: pretraining from scratch, image/audio models, non-LLM workloads. ## Workflow 1. **Extract** — read the user's message, open files, and attachments. Map signals (volume, model, spend, traffic shape, quality bar) to fields in [`references/INPUTS.md`](references/INPUTS.md). 2. **Fetch live data** — GPU $/hr from <https://www.runpod.io/pricing> (or Lambda/Modal), API per-token prices from <https://models.dev/> or the vendor page, model quality Elo from <https://lmarena.ai/>. Cite URL + timestamp in the report. 3. **Clarify** — if volume, model, or spend are missing, ask. Don't guess silently. Batch related questions. 4. **Calculate** — `echo '<json>' | python3 scripts/calc.py inference` (or `finetune`). Run more scenarios (different traffic patterns, quants, GPU tiers) when they would change the answer. 5. **Report** — verdict + cost table + assumptions with sources + what would flip the answer. ## Rules - All VRAM, GPU-hour, and dollar math goes through `scripts/calc.py`. Never compute it in-prompt. -