ubuntu-lxd-gpu-serverlisted

Install LXD on an Ubuntu server and pass all NVIDIA GPUs into LXD system containers via CDI — install snapd+LXD (snap), run `lxd init` with a ZFS or dir storage pool, set up a host CDI spec at /etc/cdi and wire the nvidia-container-toolkit auto-refresh units so it stays fresh across driver upgrades, and grant every GPU to every instance through the default profile, then verify nvidia-smi inside a container. Use when asked to install or set up LXD/lxc on a GPU host, give LXD containers GPU access, do LXD NVIDIA GPU passthrough, share all GPUs across LXD instances, when `nvidia.runtime=true` fails with "driver rpc error: timed out" (use CDI instead), or when LXD GPU containers break after a host driver upgrade (stale or duplicate CDI spec). Assumes the host NVIDIA driver + nvidia-container-toolkit are already installed (see ubuntu-nvidia-gpu-enablement).
soulmachine/skills · ★ 3 · DevOps & Infrastructure · score 74

Install: claude install-skill soulmachine/skills

# Ubuntu LXD GPU Server Install LXD on an Ubuntu host and expose **all** NVIDIA GPUs to LXD system containers via **CDI**, granted through the `default` profile so every instance inherits them. Assumes the host driver + `nvidia-container-toolkit` (`nvidia-ctk`) are already in place — if not, run the **`ubuntu-nvidia-gpu-enablement`** skill first. ⚠️ **Use CDI, not `nvidia.runtime=true`.** LXD's legacy libnvidia-container hook hangs at container start with `nvidia-container-cli: initialization error: driver rpc error: timed out` on recent kernels / Blackwell GPUs. CDI uses the host's `nvidia-ctk` and a static spec — no driver RPC, no timeout. (Why: [REFERENCE.md](REFERENCE.md) §4.) ## Quick start ```bash # 1. install LXD + wire all GPUs into the default profile. Storage: zfs:<pool>/lxd | dir | zfs-loop:50GiB sudo LXD_STORAGE=zfs:rpool/lxd bash scripts/install-lxd.sh # 2. verify a fresh container sees every GPU (launches a throwaway container, asserts the count, cleans up) bash scripts/verify-gpu.sh ``` ## Pre-flight - Sudo user (SSH fine). `nvidia-smi -L` lists the GPUs on the **host**. - `nvidia-ctk --version` works (host CDI toolkit). Missing → `ubuntu-nvidia-gpu-enablement` Step 5. - Egress to snap + the image server (`images.lxd.canonical.com`). - Storage decision: a ZFS pool (redundant root mirror, or a data pool) is ideal; otherwise `dir` works anywhere. Redundant pool → containers survive a disk loss; big stripe → more space. See REFERENCE §2. ## Steps (what `i