bernstein-quality

Solid

Show quality metrics for Bernstein runs - success rates per model, lint/test pass rates, completion time distributions. Use when the user asks about quality, reliability, which model performs best, or pass rates.

AI & Automation 481 stars 41 forks Updated today Apache-2.0

Install

View on GitHub

Quality Score: 89/100

Stars 20%
89
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
64
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

# Bernstein Quality Metrics Analyze quality and reliability of agent-generated code. ## When to Use - User asks "how reliable are the agents?" or "which model is best?" - User wants success rates, pass rates, or completion time stats - User asks about test failures or lint issues across models - User says "show me quality metrics" ## Instructions 1. Run `scripts/quality.sh metrics` for overall quality metrics. 2. Run `scripts/quality.sh pass-rates` for lint/typecheck/test pass rates by model. 3. Run `scripts/quality.sh times` for completion time distributions. 4. Present a quality dashboard: ``` ## Quality Dashboard ### Success Rate by Model | Model | Tasks | Success | Fail | Rate | |-------|-------|---------|------|------| | claude-sonnet-4 | 24 | 22 | 2 | 91.7% | | gpt-4.1 | 12 | 10 | 2 | 83.3% | ### Pass Rates | Check | Overall | claude-sonnet-4 | gpt-4.1 | |-------|---------|-----------------|---------| | Lint | 96% | 98% | 92% | | Type-check | 88% | 91% | 83% | | Tests | 85% | 89% | 75% | ### Completion Times | Percentile | Time | |------------|------| | p50 | 3m 20s | | p90 | 8m 45s | | p99 | 15m 12s | ``` 5. Highlight any models with significantly lower pass rates. 6. Recommend model routing adjustments if one model consistently underperforms.

Details

Author
sipyourdrink-ltd
Repository
sipyourdrink-ltd/bernstein
Created
2 months ago
Last Updated
today
Language
Python
License
Apache-2.0

Integrates with

Related Skills