open-webui-embeddingslisted
Install: claude install-skill air-gapped/skills
# Open WebUI embeddings + reranking — operator reference
Target: operators wiring Open WebUI's RAG pipeline to HuggingFace Text Embeddings Inference (TEI) via LiteLLM. Three hops, each with its own wire-shape quirks. Most failure modes silently degrade to "answer quality dropped" rather than visible errors — this skill is a triage for catching them at config-time.
## The architecture in 30 seconds
```
Open WebUI → LiteLLM proxy → TEI (GPU)
└ embed: openai-driver → /v1/embeddings
└ rerank: huggingface-driver → /rerank (Cohere↔TEI translation)
```
Why proxy through LiteLLM rather than point Open WebUI at TEI directly?
- **Embed:** TEI exposes `/v1/embeddings` natively (OpenAI-compat) — direct path works. LiteLLM adds: virtual-key auth, per-model rate limits, request logging, optional caching.
- **Rerank:** TEI's native `/rerank` is `{query, texts}` → `[{index, score}]`. Open WebUI's `ExternalReranker` sends Cohere shape `{query, documents, top_n}` → `{results: [{index, relevance_score}]}`. **Direct path fails with HTTP 422** — wire shapes do not match. LiteLLM's HuggingFace rerank handler translates between the two.
Skipping LiteLLM is therefore feasible only for embed; rerank requires either LiteLLM (or another Cohere↔TEI shim) unless Open WebUI itself is patched.
## Wire shapes (exact)
### Embed — Open WebUI code path
`backend/open_webui/retrieval/utils.py:677` (`generate_openai_batch_embeddings`):
```http
POST {RAG_OPENAI_API_BASE_URL}/embed