debug-stuck-eval

Solid

Debug stuck Hawk/Inspect AI evaluations. Use when user mentions "stuck eval", "eval not progressing", "eval hanging", "samples not completing", "eval set frozen", "runner stuck", "500 errors in eval", "retry loop", "eval timeout", or asks why an evaluation isn't finishing.

Code & Development 18 stars 13 forks Updated today MIT

Install

View on GitHub

Quality Score: 72/100

Stars 20%

Recency 20%

100

Frontmatter 20%

Documentation 15%

100

Issue Health 10%

License 10%

100

Description 5%

100

Skill Content

## Quick Checklist 1. **Verify auth**: `hawk auth access-token > /dev/null || echo "Run 'hawk login' first"` 2. **Get eval-set-id** from user 3. **Check status**: `hawk status <eval-set-id>` - JSON report with pod state, logs, metrics 4. **View logs**: `hawk logs <eval-set-id>` or `hawk logs -f` for follow mode 5. **List samples**: `hawk list samples <eval-set-id>` - see completion status 6. **Look for error patterns** (see below) 7. **Test API directly** if logs show retries without clear errors ## Error Patterns | Log Pattern | Meaning | Resolution | |-------------|---------|------------| | `[uuid task/id/epoch model] Retrying request to /responses` | OpenAI SDK retry with sample context | Test API directly with curl to see real error | | `[uuid task/id/epoch model] -> model retry N ... [ErrorType code]` | Inspect retry with error summary | Check error type; use curl for full details | | `500 - Internal server error` | API issue | Download buffer, find failing request, test through middleman AND directly to provider | | `400 - invalid_request_error` | Token/context limit exceeded | Check message count and model context window | | `Pod UID mismatch` | Sandbox pod was killed and restarted | No fix needed—sample errored out, Inspect will retry | | Empty output, `pending: true` | API returned malformed response | Restart eval (buffer resumes) | | OOMKilled in pod status | Memory exhaustion | Increase pod memory limits | ## Key Techniques 1. **Retry messages have sample con...

Details

Author: METR
Repository: METR/inspect-action
Created: 1 years ago
Last Updated: today
Language: Python
License: MIT

Integrates with

OpenAI · AI

Similar Skills

Semantically similar based on skill content — not just same category

AI & Automation Solid

investigate

Systematically investigate bugs, test failures, build errors, performance issues, or unexpected behavior by cycling through characterize-isolate-hypothesize-test steps. Use when the user asks to "investigate this bug", "debug this", "figure out why this fails", "find the root cause", "why is this broken", "troubleshoot this", "diagnose the issue", "what's causing this error", "look into this failure", "why is this test failing", or "track down this bug".

310 Updated today

tobihagemann

AI & Automation Solid

claude-code-debug

Troubleshoot Claude Code extensions and behavior. Triggers on: debug, troubleshoot, not working, skill not loading, hook not running, agent not found.

322 Updated today

aiskillstore

AI & Automation Featured

parallel-debugging

Debug complex issues using competing hypotheses with parallel investigation, evidence collection, and root cause arbitration. Use this skill when debugging bugs with multiple potential causes, performing root cause analysis, or organizing parallel investigation workflows.

35,935 Updated today

wshobson

AI & Automation Featured

help

Interactive workspace discovery - learn what tools, workflows, agents, and hooks are available

3,685 Updated 3 months ago

parcadei