deploy-kimi-k26-blackwelllisted
Install: claude install-skill soulmachine/skills
# Deploy Kimi-K2.6 on 8× RTX PRO 6000 Blackwell (sm_120)
Serve **Kimi-K2.6** (1T MoE; native INT4/compressed-tensors → **Marlin** path; MLA; 256K; MoonViT
vision) natively in a **uv venv** with **SGLang**, OpenAI-compatible API on `:30000`, **TP=8**, all
weights in VRAM. Verified on Ubuntu 26.04 / driver 580 / CUDA 13.1 / 8× 96 GB.
> **The two fixes in step 4 are REQUIRED on bleeding-edge Ubuntu (glibc ≥2.41, gcc ≥15).** Without
> them the server loads all weights (~15 min) and *then* crashes. Apply them before the first launch.
## Workflow
1. **Verify host** — `nvidia-smi` shows 8 GPUs, 96 GB each; driver ≥570 / CUDA ≥12.8; `uv` installed;
~650 GB free on local NVMe. Enable persistence: `sudo nvidia-smi -pm 1`.
2. **Create venv + install engine** (Python 3.12; base `sglang` IS the runtime — do **not** use `[all]`):
```bash
uv venv --python 3.12 .venv
uv pip install --python .venv/bin/python --prerelease=allow "sglang==0.5.12.post1"
uv pip install --python .venv/bin/python "kernels<0.13" # transformers 5.6 needs kernels<0.13
```
Confirm sm_120 kernels: `.venv/bin/python -c "import torch;print(torch.cuda.get_device_capability(0),'sm_120' in torch.cuda.get_arch_list())"` → `(12, 0) True`.
3. **Download checkpoint** (~595 GB) to local NVMe, pinned to a commit. The `hf`/Xet client may
deadlock mid-transfer — if it stalls, switch to the parallel-curl fallback:
```bash
bash scripts/download.sh moonshotai/Kimi-K2.6 <commit-sha> /data/models/Kimi