debug-stuck-evallisted

Debug stuck Hawk/Inspect AI evaluations. Use when user mentions "stuck eval", "eval not progressing", "eval hanging", "samples not completing", "eval set frozen", "runner stuck", "500 errors in eval", "retry loop", "eval timeout", or asks why an evaluation isn't finishing.
METR/hawk · ★ 5 · Code & Development · score 58

Install: claude install-skill METR/hawk

## Quick Checklist 1. **Verify auth**: `hawk auth access-token > /dev/null || echo "Run 'hawk login' first"` 2. **Get eval-set-id** from user 3. **Check status**: `hawk status <eval-set-id>` - JSON report with pod state, logs, metrics 4. **View logs**: `hawk logs <eval-set-id>` or `hawk logs -f` for follow mode 5. **List samples**: `hawk list samples <eval-set-id>` - see completion status 6. **Look for error patterns** (see below) 7. **Test API directly** if logs show retries without clear errors ## Error Patterns | Log Pattern | Meaning | Resolution | |-------------|---------|------------| | `[uuid task/id/epoch model] Retrying request to /responses` | OpenAI SDK retry with sample context | Test API directly with curl to see real error | | `[uuid task/id/epoch model] -> model retry N ... [ErrorType code]` | Inspect retry with error summary | Check error type; use curl for full details | | `500 - Internal server error` | API issue | Download buffer, find failing request, test through middleman AND directly to provider | | `400 - invalid_request_error` | Token/context limit exceeded | Check message count and model context window | | `Pod UID mismatch` | Sandbox pod was killed and restarted | No fix needed—sample errored out, Inspect will retry | | Empty output, `pending: true` | API returned malformed response | Restart eval (buffer resumes) | | OOMKilled in pod status | Memory exhaustion | Increase pod memory limits | ## Key Techniques 1. **Retry messages have sample con