vllm-deploymentlisted
Install: claude install-skill air-gapped/skills
# vLLM deployment (Kubernetes first, Docker lab, OpenShift sidebar)
Target audience: platform engineers bringing up vLLM on production Kubernetes (H100/H200/B200/B300 fleets), and individual researchers running 1-to-2-node Docker / Podman setups in a lab.
This skill is a **pointer map**. It points to the canonical sources — in the vLLM repo, in docs.vllm.ai, in the ecosystem repos, and to the load-bearing blog posts — rather than paraphrasing them. Paraphrase rots; pointers survive.
## Decision guide — pick the path
| Situation | Go to |
|---|---|
| Single node, 1 container, TP ≤ 8 | `references/docker-lab.md` |
| Single host, 2 containers for PD disagg lab | `references/docker-lab.md` (compose template) + `references/disagg.md` |
| k8s, single model fits 1 pod | `references/pod-shape.md` + in-tree helm chart |
| k8s, model needs multi-node TP/PP | `references/multi-node.md` (LWS + `multi-node-serving.sh`) |
| k8s fleet, router + LMCache + observability bundled | `vllm-production-stack` (Helm) — see `references/ecosystem.md` |
| k8s fleet, disagg P/D + KV-aware + GAIE + SLA scheduler | `llm-d` — see `references/ecosystem.md` |
| k8s fleet, ByteDance-scale multi-tenant LoRA + heterogenous GPU | `AIBrix` — see `references/ecosystem.md` |
| NVIDIA reference stack on prem / EKS / AKS with NIXL | `NVIDIA Dynamo` — see `references/ecosystem.md` |
| OpenShift / RHOAI | `references/openshift.md` + RHAIIS images |
| Routing / load balancing across pods | `references/routing.md` (G