ubuntu-lxd-gpu-serverlisted
Install: claude install-skill soulmachine/skills
# Ubuntu LXD GPU Server
Install LXD on an Ubuntu host and expose **all** NVIDIA GPUs to LXD system containers via **CDI**, granted
through the `default` profile so every instance inherits them. Assumes the host driver + `nvidia-container-toolkit`
(`nvidia-ctk`) are already in place — if not, run the **`ubuntu-nvidia-gpu-enablement`** skill first.
⚠️ **Use CDI, not `nvidia.runtime=true`.** LXD's legacy libnvidia-container hook hangs at container start with
`nvidia-container-cli: initialization error: driver rpc error: timed out` on recent kernels / Blackwell GPUs.
CDI uses the host's `nvidia-ctk` and a static spec — no driver RPC, no timeout. (Why: [REFERENCE.md](REFERENCE.md) §4.)
## Quick start
```bash
# 1. install LXD + wire all GPUs into the default profile. Storage: zfs:<pool>/lxd | dir | zfs-loop:50GiB
sudo LXD_STORAGE=zfs:rpool/lxd bash scripts/install-lxd.sh
# 2. verify a fresh container sees every GPU (launches a throwaway container, asserts the count, cleans up)
bash scripts/verify-gpu.sh
```
## Pre-flight
- Sudo user (SSH fine). `nvidia-smi -L` lists the GPUs on the **host**.
- `nvidia-ctk --version` works (host CDI toolkit). Missing → `ubuntu-nvidia-gpu-enablement` Step 5.
- Egress to snap + the image server (`images.lxd.canonical.com`).
- Storage decision: a ZFS pool (redundant root mirror, or a data pool) is ideal; otherwise `dir` works anywhere.
Redundant pool → containers survive a disk loss; big stripe → more space. See REFERENCE §2.
## Steps (what `i