debug-stuck-eval

Solid

Debug stuck Hawk/Inspect AI evaluations. Use when user mentions "stuck eval", "eval not progressing", "eval hanging", "samples not completing", "eval set frozen", "runner stuck", "500 errors in eval", "retry loop", "eval timeout", or asks why an evaluation isn't finishing.

Code & Development 18 stars 13 forks Updated today MIT

Install

View on GitHub

Quality Score: 72/100

Stars 20%
43
Recency 20%
100
Frontmatter 20%
70
Documentation 15%
100
Issue Health 10%
50
License 10%
100
Description 5%
100

Skill Content

## Quick Checklist 1. **Verify auth**: `hawk auth access-token > /dev/null || echo "Run 'hawk login' first"` 2. **Get eval-set-id** from user 3. **Check status**: `hawk status <eval-set-id>` - JSON report with pod state, logs, metrics 4. **View logs**: `hawk logs <eval-set-id>` or `hawk logs -f` for follow mode 5. **List samples**: `hawk list samples <eval-set-id>` - see completion status 6. **Look for error patterns** (see below) 7. **Test API directly** if logs show retries without clear errors ## Error Patterns | Log Pattern | Meaning | Resolution | |-------------|---------|------------| | `[uuid task/id/epoch model] Retrying request to /responses` | OpenAI SDK retry with sample context | Test API directly with curl to see real error | | `[uuid task/id/epoch model] -> model retry N ... [ErrorType code]` | Inspect retry with error summary | Check error type; use curl for full details | | `500 - Internal server error` | API issue | Download buffer, find failing request, test through middleman AND directly to provider | | `400 - invalid_request_error` | Token/context limit exceeded | Check message count and model context window | | `Pod UID mismatch` | Sandbox pod was killed and restarted | No fix needed—sample errored out, Inspect will retry | | Empty output, `pending: true` | API returned malformed response | Restart eval (buffer resumes) | | OOMKilled in pod status | Memory exhaustion | Increase pod memory limits | ## Key Techniques 1. **Retry messages have sample con...

Details

Author
METR
Repository
METR/inspect-action
Created
1 years ago
Last Updated
today
Language
Python
License
MIT

Integrates with

Similar Skills

Semantically similar based on skill content — not just same category