← ClaudeAtlas

vllm-chat-templateslisted

vLLM chat-template (prompt-side Jinja) operator reference. Template resolution precedence (`--chat-template` → AutoProcessor → tokenizer default → bundled fallback), `chat_template_kwargs` allowlist silently dropping `add_generation_prompt`/`enable_thinking`/custom kwargs (PR 27622 fix), 27 shipped `tool_chat_template_*.jinja` files, known template-layer bugs for Qwen3/Qwen3-Coder, DeepSeek-R1/V3/V3.1/V3.2, GPT-OSS, Kimi-K2, Llama-4, Mistral (HF vs mistral mode), Gemma-3/4, Phi-4, GLM. Prompt side only — output parsing lives in sibling skills.
air-gapped/skills · ★ 2 · AI & Automation · score 78
Install: claude install-skill air-gapped/skills
# vLLM chat templates — operator triage Target audience: operators deploying vLLM in production. Assumes OpenAI-API-compatible frontend (`/v1/chat/completions` or `/v1/responses`), multi-GPU, mid-2024 through 2026 model families. ## Scope — three sibling skills, three layers Chat-template bugs span three layers. This skill owns the **prompt-rendering** layer; route to a sibling when the problem is output-side. | Layer | Direction | Skill | What it covers | |---|---|---|---| | **Chat template (Jinja)** | messages → prompt string | **this skill** | template precedence, `--chat-template`, `chat_template_kwargs`, `examples/tool_chat_template_*.jinja`, bundled fallbacks | | **Reasoning parser** | model output → `reasoning_content` + `content` | `vllm-reasoning-parsers` | `--reasoning-parser`, `extract_reasoning`, `is_reasoning_end`, `<think>` splitting | | **Tool parser** | model output → `tool_calls[]` | `vllm-tool-parsers` | `--tool-call-parser`, streaming state machines, partial-JSON parsing | When in doubt: if the operator complains about what the **server receives**, it's this skill; if about what the **client receives**, it's one of the parser skills. Template and parser bugs often present the same symptom ("reasoning_content is null"), so all three skills name each other in diagnostics. This file stays on the Jinja side. ## Why this matters Chat templates are the **Jinja layer between structured messages and the raw prompt string the model sees**. Tool-calling and re