lmcache-mplisted
Install: claude install-skill air-gapped/skills
# LMCache multiprocess (MP) mode
Target audience: operators running vLLM on H100/H200/B200-class GPUs in production who need KV-cache extension beyond HBM and have outgrown the in-process LMCache path. Assumes Kubernetes or bare container deployment.
## Why this exists separately from `vllm-caching`
`vllm-caching` covers vLLM's **native** CPU-offload (`--kv-offloading-size`, `OffloadingConnector`) and the **in-process** `LMCacheConnectorV1` (LMCache linked into the vLLM worker). MP mode is structurally different:
- LMCache runs in its **own process / container / pod** with its own CPU and memory budget.
- vLLM talks to it over **ZMQ** (DEALER/ROUTER pattern, default port 5555).
- One LMCache server can serve **multiple vLLM pods on the same node** — they share the L1 cache.
- L2 cascade (NVMe, S3, Mooncake, HF3FS) is configured on the LMCache side, not vLLM side.
Different image pair, different deployment shape, different troubleshooting surface. Hence its own skill.
## Decision tree — pick a path
Ask in order:
1. **Single vLLM pod, only need CPU DRAM tier, no node-shared cache?**
→ Native offload (`--kv-offloading-size N --kv-offloading-backend native --disable-hybrid-kv-cache-manager`). Zero extra pods. Use the `vllm-caching` skill, not this one.
2. **Single vLLM pod, need NVMe as a third tier, but no other pod will share the cache?**
→ In-process `LMCacheConnectorV1` (`--kv-transfer-config '{"kv_connector":"LMCacheConnectorV1","kv_role":"kv_both"}'` + `LMCAC