vllm-input-modalitieslisted
Install: claude install-skill air-gapped/skills
# vLLM — embeddings, reranking, speech-to-text, OCR
Target audience: operators who need vLLM's non-chat-completion surfaces. Four
capabilities bundled here because they share operator-facing concepts
(`--runner` flag, pooling configuration, scoring API, multimodal preprocessing)
even though two run on the pooling runner (embedding, reranking) and two run
on the generate runner (STT, OCR).
## The mental model — one flag rules the surface
vLLM decides what a model *does* from the combination of three flags:
```
--runner {auto|generate|pooling|draft} # what kind of workload
--convert {auto|none|embed|classify} # adapt a generative LM to a pooler
--pooler-config '{...}' # override pool type, dimensions, etc.
```
The pair `(runner, convert)` has replaced the old `--task {generate|embed|
score|classify|reward|...}` flag. The old `--task` is **deprecated** and
still works in current releases, but emits a deprecation warning and is
scheduled for full removal. Canonical today:
| Workload | Command | Runner | Notes |
|---|---|---|---|
| Chat / completion | `vllm serve <model>` | `generate` (auto) | default |
| Embedding | `vllm serve <model> --runner pooling` | `pooling` | auto-detects CLS/LAST/MEAN from config |
| Embedding from a causal LM | `vllm serve <model> --runner pooling --convert embed` | `pooling` | adapts `*ForCausalLM` checkpoints |
| Classification | `vllm serve <model> --runner pooling --convert classify` | `pooling` | also how `scor