hugging-face-evaluation

Solid

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.

AI & Automation 40,440 stars 6528 forks Updated today MIT

Install

View on GitHub

Quality Score: 96/100

Stars 20%

100

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

# Overview This skill provides tools to add structured evaluation results to Hugging Face model cards. It supports multiple methods for adding evaluation data: - Extracting existing evaluation tables from README content - Importing benchmark scores from Artificial Analysis - Running custom model evaluations with vLLM or accelerate backends (lighteval/inspect-ai) ## When to Use - You need to add structured evaluation results to a Hugging Face model card. - You want to import benchmark data or run custom evaluations with vLLM, lighteval, or inspect-ai. - You are preparing leaderboard-compatible `model-index` metadata for a model release. ## Integration with HF Ecosystem - **Model Cards**: Updates model-index metadata for leaderboard integration - **Artificial Analysis**: Direct API integration for benchmark imports - **Papers with Code**: Compatible with their model-index specification - **Jobs**: Run evaluations directly on Hugging Face Jobs with `uv` integration - **vLLM**: Efficient GPU inference for custom model evaluation - **lighteval**: HuggingFace's evaluation library with vLLM/accelerate backends - **inspect-ai**: UK AI Safety Institute's evaluation framework # Version 1.3.0 # Dependencies ## Core Dependencies - huggingface_hub>=0.26.0 - markdown-it-py>=3.0.0 - python-dotenv>=1.2.1 - pyyaml>=6.0.3 - requests>=2.32.5 - re (built-in) ## Inference Provider Evaluation - inspect-ai>=0.3.0 - inspect-evals - openai ## vLLM Custom Model Evaluation (GPU required) - ligh...

Details

Author: sickn33
Repository: sickn33/antigravity-awesome-skills
Created: 4 months ago
Last Updated: today
Language: Python
License: MIT

Integrates with

OpenAI · AI Hugging Face · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Listed

hugging-face-evaluation

3 Updated today

tayyabexe

AI & Automation Solid

hugging-face-community-evals

Run local evaluations for Hugging Face Hub models with inspect-ai or lighteval.

40,440 Updated today

sickn33

AI & Automation Listed

evaluating-llms-harness

Evaluates LLMs across 60+ academic benchmarks (MMLU, HumanEval, GSM8K, TruthfulQA, HellaSwag). Use when benchmarking model quality, comparing models, reporting academic results, or tracking training progress. Industry standard used by EleutherAI, HuggingFace, and major labs. Supports HuggingFace, vLLM, APIs.

4 Updated today

immacualate