gza-task-debuglisted

Diagnose why a gza task failed — analyzes logs, detects loops, checks diffs, compares baselines, and suggests fixes
mhawthorne/gza · ★ 11 · Code & Development · score 78

Install: claude install-skill mhawthorne/gza

# Gza Task Debug Diagnose why a gza task failed by analyzing logs, detecting agent loops, comparing against baselines, and providing actionable recommendations. ## Process ### Step 1: Get task ID The user should provide a full prefixed task ID (for example, `gza-1234`). Extract it from the input. ### Step 2: Query task from database Run a Python one-liner to get all task details as JSON: ```bash uv run python -c "from gza.db import get_task; import json; print(json.dumps(get_task(<ID>), indent=2, default=str))" ``` Note the following fields for analysis: - `status` — should be `failed` or `max_turns` (or possibly `completed` if user suspects partial failure) - `num_turns` — number of agent turns used - `duration_seconds` — total wall-clock time - `cost_usd` — API cost - `log_file` — provider conversation transcript path; also inspect the sibling `<stem>.ops.jsonl` file for runner lifecycle, preflight, command, outcome, and stats events - `report_file` — path to the report (if any) - `branch` — git branch the task worked on ### Step 3: Baseline comparison Compare the failed task's metrics against the last 20 completed tasks: ```bash uv run python -c "from gza.db import get_baseline_stats; import json; print(json.dumps(get_baseline_stats(20)))" ``` Calculate how far the failed task deviates: - If `num_turns` is 2x+ the average → flag as high turns - If `cost_usd` is 3x+ the average → flag as high cost - Report the ratio (e.g., "3.2x more turns than average completed