vllm-chat-templateslisted
Install: claude install-skill air-gapped/skills
# vLLM chat templates — operator triage
Target audience: operators deploying vLLM in production. Assumes OpenAI-API-compatible frontend (`/v1/chat/completions` or `/v1/responses`), multi-GPU, mid-2024 through 2026 model families.
## Scope — three sibling skills, three layers
Chat-template bugs span three layers. This skill owns the **prompt-rendering** layer; route to a sibling when the problem is output-side.
| Layer | Direction | Skill | What it covers |
|---|---|---|---|
| **Chat template (Jinja)** | messages → prompt string | **this skill** | template precedence, `--chat-template`, `chat_template_kwargs`, `examples/tool_chat_template_*.jinja`, bundled fallbacks |
| **Reasoning parser** | model output → `reasoning_content` + `content` | `vllm-reasoning-parsers` | `--reasoning-parser`, `extract_reasoning`, `is_reasoning_end`, `<think>` splitting |
| **Tool parser** | model output → `tool_calls[]` | `vllm-tool-parsers` | `--tool-call-parser`, streaming state machines, partial-JSON parsing |
When in doubt: if the operator complains about what the **server receives**, it's this skill; if about what the **client receives**, it's one of the parser skills. Template and parser bugs often present the same symptom ("reasoning_content is null"), so all three skills name each other in diagnostics. This file stays on the Jinja side.
## Why this matters
Chat templates are the **Jinja layer between structured messages and the raw prompt string the model sees**. Tool-calling and re