deploy-kimi-k26-blackwelllisted

Deploy and serve Moonshot Kimi-K2.6 (1T MoE, INT4/Marlin, 256K context, vision) on an Ubuntu server with 8× NVIDIA RTX PRO 6000 Blackwell (sm_120) GPUs using a uv venv + SGLang, exposing an OpenAI-compatible API. Use when deploying or serving Kimi-K2.6 (or similar INT4 compressed-tensors MoE models) on RTX PRO 6000 Blackwell / sm_120 hardware, installing SGLang in a uv venv, or troubleshooting sm_120 startup crashes — CUDA JIT "rsqrt exception specification" / glibc errors, FlashInfer CuTe-DSL MLIR ICE (llvm.mlir.global_dtors), or a slow/hung MoE weight load.
soulmachine/skills · ★ 2 · DevOps & Infrastructure · score 75

Install: claude install-skill soulmachine/skills

# Deploy Kimi-K2.6 on 8× RTX PRO 6000 Blackwell (sm_120) Serve **Kimi-K2.6** (1T MoE; native INT4/compressed-tensors → **Marlin** path; MLA; 256K; MoonViT vision) natively in a **uv venv** with **SGLang**, OpenAI-compatible API on `:30000`, **TP=8**, all weights in VRAM. Verified on Ubuntu 26.04 / driver 580 / CUDA 13.1 / 8× 96 GB. > **The two fixes in step 4 are REQUIRED on bleeding-edge Ubuntu (glibc ≥2.41, gcc ≥15).** Without > them the server loads all weights (~15 min) and *then* crashes. Apply them before the first launch. ## Workflow 1. **Verify host** — `nvidia-smi` shows 8 GPUs, 96 GB each; driver ≥570 / CUDA ≥12.8; `uv` installed; ~650 GB free on local NVMe. Enable persistence: `sudo nvidia-smi -pm 1`. 2. **Create venv + install engine** (Python 3.12; base `sglang` IS the runtime — do **not** use `[all]`): ```bash uv venv --python 3.12 .venv uv pip install --python .venv/bin/python --prerelease=allow "sglang==0.5.12.post1" uv pip install --python .venv/bin/python "kernels<0.13" # transformers 5.6 needs kernels<0.13 ``` Confirm sm_120 kernels: `.venv/bin/python -c "import torch;print(torch.cuda.get_device_capability(0),'sm_120' in torch.cuda.get_arch_list())"` → `(12, 0) True`. 3. **Download checkpoint** (~595 GB) to local NVMe, pinned to a commit. The `hf`/Xet client may deadlock mid-transfer — if it stalls, switch to the parallel-curl fallback: ```bash bash scripts/download.sh moonshotai/Kimi-K2.6 <commit-sha> /data/models/Kimi