omniroute-cli-eval

Solid

Run and manage OmniRoute eval suites from the CLI — create suites, run benchmarks, watch live results, view scorecards, and compare model performance. Use when the user wants to benchmark models, validate quality regressions, or automate LLM evals in CI.

AI & Automation 6,067 stars 1058 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# OmniRoute — CLI Evals Requires the `omniroute` CLI. See [CLI entry-point skill](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute-cli/SKILL.md) for install + global flags. ## What are evals? Evals are automated test suites that score LLM outputs against expected answers or rubrics. OmniRoute stores suites and run results in its local database. ## Eval suites ```bash omniroute eval suites list # List all eval suites omniroute eval suites list --json # JSON output omniroute eval suites get <suiteId> # Full suite definition ``` ### Create a suite ```bash omniroute eval suites create \ --name "code-quality" \ --rubric "exact-match" \ --samples-file ./samples.jsonl # JSONL: {input, expected_output} ``` Rubric options: `exact-match`, `contains`, `llm-judge`, `regex`. `--samples-file` format (one JSON object per line): ```jsonl {"input": "What is 2+2?", "expected_output": "4"} {"input": "Translate 'hello' to Spanish", "expected_output": "hola"} ``` ## Run an eval ```bash omniroute eval suites run <suiteId> \ --model claude-sonnet-4-6 # Run suite against a specific model omniroute eval suites run <suiteId> \ --model gpt-4o \ --watch # Live TUI progress (EvalWatch) ``` The run is asynchronous. Use `--watch` for a live terminal dashboard or poll manually: ```bash RUN_ID=$(omniroute eval suites run <suite...

Details

Author
diegosouzapw
Repository
diegosouzapw/OmniRoute
Created
3 months ago
Last Updated
today
Language
TypeScript
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category