omniroute-cli-eval

Solid

Run and manage OmniRoute eval suites from the CLI — create suites, run benchmarks, watch live results, view scorecards, and compare model performance. Use when the user wants to benchmark models, validate quality regressions, or automate LLM evals in CI.

AI & Automation 6,067 stars 1058 forks Updated today MIT

Install

View on GitHub

Quality Score: 93/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# OmniRoute — CLI Evals Requires the `omniroute` CLI. See [CLI entry-point skill](https://raw.githubusercontent.com/diegosouzapw/OmniRoute/main/skills/omniroute-cli/SKILL.md) for install + global flags. ## What are evals? Evals are automated test suites that score LLM outputs against expected answers or rubrics. OmniRoute stores suites and run results in its local database. ## Eval suites ```bash omniroute eval suites list # List all eval suites omniroute eval suites list --json # JSON output omniroute eval suites get <suiteId> # Full suite definition ``` ### Create a suite ```bash omniroute eval suites create \ --name "code-quality" \ --rubric "exact-match" \ --samples-file ./samples.jsonl # JSONL: {input, expected_output} ``` Rubric options: `exact-match`, `contains`, `llm-judge`, `regex`. `--samples-file` format (one JSON object per line): ```jsonl {"input": "What is 2+2?", "expected_output": "4"} {"input": "Translate 'hello' to Spanish", "expected_output": "hola"} ``` ## Run an eval ```bash omniroute eval suites run <suiteId> \ --model claude-sonnet-4-6 # Run suite against a specific model omniroute eval suites run <suiteId> \ --model gpt-4o \ --watch # Live TUI progress (EvalWatch) ``` The run is asynchronous. Use `--watch` for a live terminal dashboard or poll manually: ```bash RUN_ID=$(omniroute eval suites run <suite...

Details

Author: diegosouzapw
Repository: diegosouzapw/OmniRoute
Created: 3 months ago
Last Updated: today
Language: TypeScript
License: MIT

Integrates with

OpenAI · AI Anthropic · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

cli-eval

Create and run evaluation suites, watch live benchmark progress, view scorecards, compare model performance, and integrate eval runs with CI workflows from the CLI.

6,067 Updated today

diegosouzapw

AI & Automation Listed

run-evals

Run the Nomos eval suite -- recall@5, per-user isolation, and the end-to-end agent eval with the Opus-4.8 DB-content audit + the spec-driven feature-manifest audit. Use /run-evals when asked to run the evals, verify the memory system, check tenant isolation, or audit that features are actually wired and their DB effects land.

22 Updated today

project-nomos

AI & Automation Listed

eval-runner

Run eval scenarios to benchmark Mycelium effectiveness. Execute tasks using reflexion loop, validate against success criteria, record metrics.

33 Updated today

haabe