nvidia-nixl

Solid

NVIDIA Inference Xfer Library (NIXL) operator + developer reference. Point-to-point KV-cache and tensor transport for distributed inference (Dynamo, vLLM, SGLang). Covers the agent API (full Python reference; C++/Rust via upstream pointers), all 15 backend plugins (UCX, GDS, GDS_MT, libfabric, mooncake, posix, hf3fs, obj/S3, azure_blob, infinia/DDN, gusli, uccl, gpunetio/DOCA, telemetry, tracing), AMD ROCm/HIP support, build paths (pip nixl-cu12/cu13, meson+ninja from source), ETCD vs side-channel metadata, telemetry (Prometheus + cyclic shared-memory), NIXL-EP elastic MoE device kernels, and Dynamo / vLLM NixlConnector / SGLang integration patterns.

DevOps & Infrastructure 3 stars 1 forks Updated 2 days ago MIT

Install

View on GitHub

Quality Score: 79/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# NVIDIA Inference Xfer Library (NIXL) Target audience: operators wiring NIXL into Dynamo/vLLM/SGLang clusters, plugin authors writing new backends, developers using the agent API directly from Python (`references/python-api.md`). C++/Rust developers: consult `src/api/cpp/` headers and `examples/{cpp,rust}/` upstream directly — this skill does not carry a C++/Rust API reference. Assumes datacenter-class GPUs (H100/H200/B200/B300) with NVIDIA driver, CUDA 12.8+, RDMA NIC (Mellanox/EFA) for cross-node, and Linux (Ubuntu 22.04/24.04 or Fedora). macOS and Windows are not supported. ## What NIXL is — one paragraph NIXL is the *transport*, not a cache. If the goal is a KV cache that outlives a single vLLM process, the thing being configured is **`lmcache-mp`** (same `inference-cache` plugin) — a standalone LMCache server that can use NIXL as one of its backends. Come here for the agent API, plugins, and wire-level behaviour; go there for the cache server's deployment, sizing, and ZMQ wiring. NIXL is a thin abstraction over heterogeneous transport backends. A `nixlAgent` registers memory regions (DRAM, VRAM, FILE, BLOCK, OBJ), exchanges metadata with peer agents via either ETCD or socket side-channel, then issues asynchronous one-sided `READ`/`WRITE` transfers between local and remote registered memory. The agent picks the best backend (UCX for network, GDS for storage, etc.) based on memory types and what both sides have loaded. Same-process loopback, intra-node GPU-to-GPU, and...

Details

Author: air-gapped
Repository: air-gapped/skills
Created: 3 months ago
Last Updated: 2 days ago
Language: Python
License: MIT

Integrates with

Anthropic · AI Kubernetes · Infrastructure

Bundled in these plugins

skills

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

nix

Nix language, flakes, NixOS, Home Manager, and agent-skills packaging. TRIGGER when: working with .nix files, flake.nix, flake.lock, Nargo.toml (Nix packaging context), NixOS configuration, Home Manager modules, nix-agent MCP tools, agent-skills-nix deployment, or rigup.nix riglets. DO NOT TRIGGER when: only using nix PATH (chezmoi handles that), or working on ZK circuits (use noir skill), or Nix language is incidental to another domain.

1 Updated 1 weeks ago

DROOdotFOO

AI & Automation Solid

vllm-caching

vLLM tiered KV cache configuration for production H100/H200 clusters. Native CPU offload, LMCache (CPU+NVMe+GDS), NixlConnector (disaggregated prefill), MooncakeConnector (RDMA), MultiConnector composition. Version gates, sizing math (flag total across TP, not per-GPU — opposite of SGLang), KV-vs-weights offload distinction operators most often get wrong.

3 Updated 2 days ago

air-gapped

AI & Automation Solid

nix-agent

Use when a user wants to change NixOS packages, options, modules, or local configuration and the host exposes the nix-agent MCP server (usually alongside mcp-nixos).

15 Updated today

JEFF7712