pinchbenchlisted
Install: claude install-skill aiskillstore/marketplace
# PinchBench Benchmark Skill
PinchBench measures how well LLM models perform as the brain of an OpenClaw agent. Results are collected on a public leaderboard at [pinchbench.com](https://pinchbench.com).
## Prerequisites
- Python 3.10+
- [uv](https://docs.astral.sh/uv/) package manager
- OpenClaw instance (this agent)
## Quick Start
```bash
cd <skill_directory>
# Run benchmark with a specific model
uv run benchmark.py --model anthropic/claude-sonnet-4
# Run only automated tasks (faster)
uv run benchmark.py --model anthropic/claude-sonnet-4 --suite automated-only
# Run specific tasks
uv run benchmark.py --model anthropic/claude-sonnet-4 --suite task_01_calendar,task_02_stock
# Skip uploading results
uv run benchmark.py --model anthropic/claude-sonnet-4 --no-upload
```
## Available Tasks (23)
| Task | Category | Description |
|------|----------|-------------|
| `task_00_sanity` | Basic | Verify agent works |
| `task_01_calendar` | Productivity | Calendar event creation |
| `task_02_stock` | Research | Stock price lookup |
| `task_03_blog` | Writing | Blog post creation |
| `task_04_weather` | Coding | Weather script |
| `task_05_summary` | Analysis | Document summarization |
| `task_06_events` | Research | Conference research |
| `task_07_email` | Writing | Email drafting |
| `task_08_memory` | Memory | Context retrieval |
| `task_09_files` | Files | File structure creation |
| `task_10_workflow` | Integration | Multi-step API workflow |
| `task_11_clawdhub` | Skills