multi-model-routinglisted
Install: claude install-skill mickolasjae/mick-applied-ai-toolkit
# Multi-Model Routing
Scaffold a cost-aware LLM router that picks the cheapest model that clears each task type's quality bar, then logs per-call cost so you can keep optimizing.
## 1. When to use this skill
Trigger any time the application has — or will soon have — more than one model in play and you want to stop hand-coding which one gets called where.
Phrases that should fire this skill:
- "route between Claude and GPT"
- "cheapest model for [task]"
- "multi-model dispatch"
- "LLM cost telemetry"
- "which model should I use for vocab extraction / classification / RAG / X"
- "reduce our LLM bill"
- "switch this call to a cheaper model"
This is the LLM analog of the "right tool for the job" rule. Do not use Opus to classify a string. Do not use Haiku to plan a 12-step agent run. The router enforces that discipline at the call site.
## 2. The task-taxonomy approach
The verified pattern (from Mercor's scoring module, README §5/9/10) is:
> "We choose the model based on the type of rubric item — not all items need the same LLM."
Concretely, they route:
- **Forms (text-only)** → `o4-mini` — cheap, fast, text-only
- **Interviews (video + audio + transcript)** → `gemini-2.5-flash` — multimodal capable
That's the whole idea, generalized:
1. **Enumerate the task types** in your application. Not endpoints, not services — *task types*. e.g. "classify intent", "extract vocab pairs", "synthesize RAG answer", "plan multi-step agent action", "transcribe voice memo", "reason over