← ClaudeAtlas

gpu-host-tuninglisted

Audit AND tune Linux/GPU inference hosts — read-only host snapshot (CPU power state, C-states, NUMA topology, PCIe link state, GPU settings, kernel boot params, sysctl, ulimits, IRQ affinity, container runtime), optional pinned-host↔GPU memcpy bench (torch + numactl), and per-lever cheat-sheets to flip settings (governor, EPP, cpuidle, persistence, ECC, hugepages, intel_iommu, NCCL env, tuned-adm profiles, Dell/Supermicro/HPE BIOS guidance). Sits beneath any inference framework (vLLM, sglang, TensorRT-LLM) — about the host, not the framework.
air-gapped/skills · ★ 2 · AI & Automation · score 78
Install: claude install-skill air-gapped/skills
# gpu-host-tuning Host-side tuning + audit for Linux GPU inference servers. Sits *beneath* any inference framework (vLLM, sglang, TensorRT-LLM, llama.cpp). Three modes: 1. **Audit** — read-only snapshot 2. **Bench** — ground-truth pinned-host↔GPU memcpy ceiling 3. **Tune** — apply individual levers from the cheat-sheet This file is a pointer map. The actual logic lives in `scripts/` and the authoritative references in `references/`. ## Quick start ```bash # From the skill directory — typically ~/.claude/skills/gpu-host-tuning # (personal) or .claude/skills/gpu-host-tuning (project install). # Audit (read-only, ~60s) ./scripts/collect.sh # Audit + pinned-memcpy bench (needs torch + CUDA, ~5 min) ./scripts/collect.sh --bench ``` The script prompts for the output parent dir on first interactive run and remembers the choice. Override via `--out <dir>` or `HOST_AUDIT_DIR=<dir>`. Default snapshot dirname is `gpu-host-tuning-<host>-<UTC>`. ## What the snapshot captures One file per probe, numbered by section. See [`references/probe-interpretation.md`](references/probe-interpretation.md) for the full file-by-file decoder. | Section | What | |---|---| | `00-09` meta | collector version, run timestamp, args | | `10-19` system + firmware | dmidecode (BIOS, CPU, memory DIMMs), lshw, /sys/class/dmi | | `20-29` CPU + power + C-states | governor, EPP, intel_pstate / amd_pstate, cpuidle states + disable mask, turbostat 5s residency, microcode, vulnerabilities, thermal zones | | `3