← ClaudeAtlas

open-webui-embeddingslisted

Wire HuggingFace embedding + reranker models (BGE-M3, BGE-Reranker-v2-m3, etc.) into Open WebUI's RAG pipeline via LiteLLM as a proxy in front of HuggingFace Text Embeddings Inference (TEI). Covers the exact wire shapes Open WebUI sends (URL auto-append on embed but NOT rerank; payload + response shapes for both modes), the LiteLLM ↔ TEI gotchas (encoding_format=null trap, HF-driver task_type misdetection, openai-driver vs huggingface-driver tradeoffs), TEI configuration cliffs (max-client-batch-size 422 under hybrid search, max-batch-tokens AS the auto-truncate boundary, arch-specific Docker images), and the end-to-end production-grade config. BGE-M3 and BGE-Reranker-v2-m3 are the worked examples; the patterns generalise to any TEI-served encoder.
air-gapped/skills · ★ 2 · AI & Automation · score 78
Install: claude install-skill air-gapped/skills
# Open WebUI embeddings + reranking — operator reference Target: operators wiring Open WebUI's RAG pipeline to HuggingFace Text Embeddings Inference (TEI) via LiteLLM. Three hops, each with its own wire-shape quirks. Most failure modes silently degrade to "answer quality dropped" rather than visible errors — this skill is a triage for catching them at config-time. ## The architecture in 30 seconds ``` Open WebUI → LiteLLM proxy → TEI (GPU) └ embed: openai-driver → /v1/embeddings └ rerank: huggingface-driver → /rerank (Cohere↔TEI translation) ``` Why proxy through LiteLLM rather than point Open WebUI at TEI directly? - **Embed:** TEI exposes `/v1/embeddings` natively (OpenAI-compat) — direct path works. LiteLLM adds: virtual-key auth, per-model rate limits, request logging, optional caching. - **Rerank:** TEI's native `/rerank` is `{query, texts}` → `[{index, score}]`. Open WebUI's `ExternalReranker` sends Cohere shape `{query, documents, top_n}` → `{results: [{index, relevance_score}]}`. **Direct path fails with HTTP 422** — wire shapes do not match. LiteLLM's HuggingFace rerank handler translates between the two. Skipping LiteLLM is therefore feasible only for embed; rerank requires either LiteLLM (or another Cohere↔TEI shim) unless Open WebUI itself is patched. ## Wire shapes (exact) ### Embed — Open WebUI code path `backend/open_webui/retrieval/utils.py:677` (`generate_openai_batch_embeddings`): ```http POST {RAG_OPENAI_API_BASE_URL}/embed