pinchbenchlisted

Run PinchBench benchmarks to evaluate OpenClaw agent performance across real-world tasks. Use when testing model capabilities, comparing models, submitting benchmark results to the leaderboard, or checking how well your OpenClaw setup handles calendar, email, research, coding, and multi-step workflows.
aiskillstore/marketplace · ★ 329 · AI & Automation · score 82

Install: claude install-skill aiskillstore/marketplace

# PinchBench Benchmark Skill PinchBench measures how well LLM models perform as the brain of an OpenClaw agent. Results are collected on a public leaderboard at [pinchbench.com](https://pinchbench.com). ## Prerequisites - Python 3.10+ - [uv](https://docs.astral.sh/uv/) package manager - OpenClaw instance (this agent) ## Quick Start ```bash cd <skill_directory> # Run benchmark with a specific model uv run benchmark.py --model anthropic/claude-sonnet-4 # Run only automated tasks (faster) uv run benchmark.py --model anthropic/claude-sonnet-4 --suite automated-only # Run specific tasks uv run benchmark.py --model anthropic/claude-sonnet-4 --suite task_01_calendar,task_02_stock # Skip uploading results uv run benchmark.py --model anthropic/claude-sonnet-4 --no-upload ``` ## Available Tasks (23) | Task | Category | Description | |------|----------|-------------| | `task_00_sanity` | Basic | Verify agent works | | `task_01_calendar` | Productivity | Calendar event creation | | `task_02_stock` | Research | Stock price lookup | | `task_03_blog` | Writing | Blog post creation | | `task_04_weather` | Coding | Weather script | | `task_05_summary` | Analysis | Document summarization | | `task_06_events` | Research | Conference research | | `task_07_email` | Writing | Email drafting | | `task_08_memory` | Memory | Context retrieval | | `task_09_files` | Files | File structure creation | | `task_10_workflow` | Integration | Multi-step API workflow | | `task_11_clawdhub` | Skills