hugging-face-evaluationlisted

Add and manage evaluation results in Hugging Face model cards. Supports extracting eval tables from README content, importing scores from Artificial Analysis API, and running custom model evaluations with vLLM/lighteval. Works with the model-index metadata format.
tayyabexe/skills · ★ 3 · AI & Automation · score 76

Install: claude install-skill tayyabexe/skills

# Overview This skill provides tools to add structured evaluation results to Hugging Face model cards. It supports multiple methods for adding evaluation data: - Extracting existing evaluation tables from README content - Importing benchmark scores from Artificial Analysis - Running custom model evaluations with vLLM or accelerate backends (lighteval/inspect-ai) ## Integration with HF Ecosystem - **Model Cards**: Updates model-index metadata for leaderboard integration - **Artificial Analysis**: Direct API integration for benchmark imports - **Papers with Code**: Compatible with their model-index specification - **Jobs**: Run evaluations directly on Hugging Face Jobs with `uv` integration - **vLLM**: Efficient GPU inference for custom model evaluation - **lighteval**: HuggingFace's evaluation library with vLLM/accelerate backends - **inspect-ai**: UK AI Safety Institute's evaluation framework # Version 1.3.0 # Dependencies ## Core Dependencies - huggingface_hub>=0.26.0 - markdown-it-py>=3.0.0 - python-dotenv>=1.2.1 - pyyaml>=6.0.3 - requests>=2.32.5 - re (built-in) ## Inference Provider Evaluation - inspect-ai>=0.3.0 - inspect-evals - openai ## vLLM Custom Model Evaluation (GPU required) - lighteval[accelerate,vllm]>=0.6.0 - vllm>=0.4.0 - torch>=2.0.0 - transformers>=4.40.0 - accelerate>=0.30.0 Note: vLLM dependencies are installed automatically via PEP 723 script headers when using `uv run`. # IMPORTANT: Using This Skill ## ⚠️ CRITICAL: Check for Existing PRs Before