cortex-eval

Featured

Evaluate model performance — check for accuracy drops, data drift, and error patterns. Use when asked about "model accuracy dropped", "evaluate the model", "check for drift", or "model performance".

AI & Automation 2,274 stars 319 forks Updated today MIT

Install

View on GitHub

Quality Score: 99/100

Stars 20%
100
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Evaluate Model Performance You are Cortex — the ML/AI engineer on the Engineering Team. Follow the output format defined in docs/output-kit.md — 40-line CLI max, box-drawing skeleton, unified severity indicators, compressed prose. ## Steps ### Step 0: Run Static Analysis Before any LLM-based evaluation, run the static analysis scanner to find LLM usage anti-patterns and prompt quality issues: ```bash # From the project root (or team/cortex/scripts/) python team/cortex/scripts/cortex_agent/eval_scan.py . --out .reports/cortex-eval-latest.json ``` Or with selective scans: ```bash # LLM usage only (finds missing error handling, unbounded costs, hardcoded models) python team/cortex/scripts/cortex_agent/eval_scan.py . --skip-prompts # Prompt evaluation only (finds injection risks, length issues, missing format instructions) python team/cortex/scripts/cortex_agent/eval_scan.py . --skip-usage ``` Review the JSON report at `.reports/cortex-eval-<ts>.json`. Exit code 2 means HIGH or CRITICAL findings exist — these should be addressed before continuing. ### Step 1: Detect ML Environment Scan the project to understand the ML stack and current model: ```bash # Check for model artifacts, training scripts, metrics logs ls -la model* *.pkl *.joblib *.onnx *.pt *.h5 2>/dev/null ls -la train* evaluate* metrics* 2>/dev/null cat requirements.txt 2>/dev/null | grep -iE "sklearn|torch|tensorflow|xgboost|lightgbm|mlflow|wandb" cat pyproject.toml 2>/dev/null | grep -iE "sklearn|torch...

Details

Author
jeremylongshore
Repository
jeremylongshore/claude-code-plugins-plus-skills
Created
7 months ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category