nemo-evaluator-sdk

Install

View on GitHub

Quality Score: 99/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

70

Documentation 15%

100

Issue Health 10%

50

License 10%

100

Description 5%

100

Skill Content

# NeMo Evaluator SDK - Enterprise LLM Benchmarking ## Quick Start NeMo Evaluator SDK evaluates LLMs across 100+ benchmarks from 18+ harnesses using containerized, reproducible evaluation with multi-backend execution (local Docker, Slurm HPC, Lepton cloud). **Installation**: ```bash pip install nemo-evaluator-launcher ``` **Set API key and run evaluation**: ```bash export NGC_API_KEY=nvapi-your-key-here # Create minimal config cat > config.yaml << 'EOF' defaults: - execution: local - deployment: none - _self_ execution: output_dir: ./results target: api_endpoint: model_id: meta/llama-3.1-8b-instruct url: https://integrate.api.nvidia.com/v1/chat/completions api_key_name: NGC_API_KEY evaluation: tasks: - name: ifeval EOF # Run evaluation nemo-evaluator-launcher run --config-dir . --config-name config ``` **View available tasks**: ```bash nemo-evaluator-launcher ls tasks ``` ## Common Workflows ### Workflow 1: Evaluate Model on Standard Benchmarks Run core academic benchmarks (MMLU, GSM8K, IFEval) on any OpenAI-compatible endpoint. **Checklist**: ``` Standard Evaluation: - [ ] Step 1: Configure API endpoint - [ ] Step 2: Select benchmarks - [ ] Step 3: Run evaluation - [ ] Step 4: Check results ``` **Step 1: Configure API endpoint** ```yaml # config.yaml defaults: - execution: local - deployment: none - _self_ execution: output_dir: ./results target: api_endpoint: model_id: meta/llama-3.1-8b-instruct url: https://int...

Details

Author: davila7
Repository: davila7/claude-code-templates
Created: 10 months ago
Last Updated: today
Language: Python
License: MIT

Install

Quality Score: 99/100

Skill Content

Details

Integrates with

Related Skills

devops-deploy

multi-cloud-architecture

lambda-labs-gpu-cloud