← ClaudeAtlas

ubuntu-nvidia-gpu-enablementlisted

Enable NVIDIA GPUs on a Ubuntu server for compute/inference serving — install the open-kernel-module driver (required for Blackwell/Hopper), CUDA toolkit, turn on IOMMU (intel_iommu=on iommu=pt), set up nvidia-persistenced, and install/wire a container runtime (Docker + nvidia-container-toolkit, or the minimal CLI), then verify all GPUs, IOMMU groups, P2P, nvcc, and GPU containers. Use when asked to enable or set up NVIDIA GPUs, install the NVIDIA driver + CUDA on Ubuntu, install Docker + nvidia-container-toolkit for GPU containers, configure GPU IOMMU/passthrough, prepare a host for GPU serving (vLLM/PyTorch/TensorRT/NIM), or troubleshoot nouveau, persistence mode, GPU-in-container, or driver/CUDA/glibc problems.
soulmachine/skills · ★ 2 · AI & Automation · score 75
Install: claude install-skill soulmachine/skills
# Ubuntu NVIDIA GPU Enablement Bring a fresh UEFI Ubuntu server with NVIDIA GPUs to a serving-ready state. Order matters: **driver → IOMMU cmdline → CUDA → persistence → container access → verify.** Driver and cmdline changes each need a reboot — batch them (Steps 1 + 2 reboot together). Run as a sudo user (over SSH is fine). [REFERENCE.md](REFERENCE.md) holds the *why*, GRUB/AMD variants, and troubleshooting. Boot-affecting steps are deliberate — keep BMC/console access as a fallback. ## Pre-flight - UEFI: `[ -d /sys/firmware/efi ]`. - Secure Boot: `mokutil --sb-state`. **If enabled**, DKMS modules need MOK enrollment at the console — plan for it or disable SB. (Off is typical with a custom bootloader; verify, don't assume.) - GPUs present: `sudo apt-get install -y pciutils && lspci -nn | grep -i nvidia`. Note the architecture. - Bootloader = GRUB or ZFSBootMenu? (decides where the cmdline lives — Step 2.) - Egress to `archive.ubuntu.com` (+ `nvidia.github.io` for the container repo). ## Step 1 — Driver (OPEN kernel module) ```bash sudo apt-get update sudo apt-get install -y nvidia-driver-580-server-open # example version ``` - **Blackwell requires the open modules** (proprietary won't support it); Hopper too. Pick the right branch with `ubuntu-drivers devices`. Datacenter / "Server Edition" cards → `nvidia-driver-<ver>-server-open`. - Auto-blacklists nouveau; needs a reboot (do it with Step 2). - Fabric Manager is **only** for NVLink/NVSwitch systems — skip it i