nemo-evaluator-sdk

Featured

Evaluates LLMs across 100+ benchmarks from 18+ harnesses (MMLU, HumanEval, GSM8K, safety, VLM) with multi-backend execution. Use when needing scalable evaluation on local Docker, Slurm HPC, or cloud platforms. NVIDIA's enterprise-grade platform with container-first architecture for reproducible benchmarking.

DevOps & Infrastructure 27,632 stars 2848 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# NeMo Evaluator SDK - Enterprise LLM Benchmarking ## Quick Start NeMo Evaluator SDK evaluates LLMs across 100+ benchmarks from 18+ harnesses using containerized, reproducible evaluation with multi-backend execution (local Docker, Slurm HPC, Lepton cloud). **Installation**: ```bash pip install nemo-evaluator-launcher ``` **Set API key and run evaluation**: ```bash export NGC_API_KEY=nvapi-your-key-here # Create minimal config cat > config.yaml << 'EOF' defaults: - execution: local - deployment: none - _self_ execution: output_dir: ./results target: api_endpoint: model_id: meta/llama-3.1-8b-instruct url: https://integrate.api.nvidia.com/v1/chat/completions api_key_name: NGC_API_KEY evaluation: tasks: - name: ifeval EOF # Run evaluation nemo-evaluator-launcher run --config-dir . --config-name config ``` **View available tasks**: ```bash nemo-evaluator-launcher ls tasks ``` ## Common Workflows ### Workflow 1: Evaluate Model on Standard Benchmarks Run core academic benchmarks (MMLU, GSM8K, IFEval) on any OpenAI-compatible endpoint. **Checklist**: ``` Standard Evaluation: - [ ] Step 1: Configure API endpoint - [ ] Step 2: Select benchmarks - [ ] Step 3: Run evaluation - [ ] Step 4: Check results ``` **Step 1: Configure API endpoint** ```yaml # config.yaml defaults: - execution: local - deployment: none - _self_ execution: output_dir: ./results target: api_endpoint: model_id: meta/llama-3.1-8b-instruct url: https://int...

Details

Author
davila7
Repository
davila7/claude-code-templates
Created
10 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Related Skills