← ClaudeAtlas

aiperflisted

NVIDIA AIPerf — vendor-neutral generative-AI inference benchmarking (genai-perf successor). Covers `aiperf profile` with concurrency / request-rate / fixed-schedule trace replay / user-centric / multi-run confidence, 15 endpoint types (chat, completions, embeddings, rankings, responses, image-gen, video-gen, NIM, HF-TGI, template, etc.), 6 custom dataset formats (single_turn, multi_turn, mooncake_trace, bailian_trace, burst_gpt_trace, random_pool), 40+ public datasets, goodput SLOs, GPU + Prometheus telemetry, plot/analyze-trace/synthesize/service subcommands, plugin extensibility, and reasoning-token TTFT/TTFO split.
air-gapped/skills · ★ 2 · AI & Automation · score 78
Install: claude install-skill air-gapped/skills
# AIPerf — NVIDIA generative-AI inference benchmarking Target audience: operators producing defensible latency/throughput/goodput numbers against any OpenAI-compatible inference server (vLLM, SGLang, TensorRT-LLM, NVIDIA Dynamo, NIM, Triton, HF TGI, Ollama), and developers extending AIPerf with custom endpoints, datasets, exporters, or metrics. ## Why this matters `aiperf` is the open-source successor to `genai-perf`, written by NVIDIA's AI-Dynamo team. It is **the** vendor-neutral way to: 1. **Replay production traces** (Mooncake / Bailian / BurstGPT) at exact timestamps — synthetic load lies about cache reuse and tail behavior. 2. **Measure goodput**, not just throughput — the percentage of requests that meet **all** SLOs simultaneously. A system at 1000 req/s throughput and 28% goodput is mis-provisioned by ~3.5×. 3. **Account for reasoning tokens correctly.** GPT-OSS / DeepSeek-R1 / Qwen3 emit `reasoning_content` before the answer. genai-perf ignored those; aiperf splits **TTFT** (any first token) from **TTFO** (first non-reasoning token). Numbers are not directly comparable across the two tools — see migration notes. 4. **Collect server + GPU telemetry alongside client-side timings** in one run — DCGM / pynvml GPU metrics + Prometheus `/metrics` scrape, all aligned in the same artifact dir. 5. **Extend cleanly** — 25 plugin categories with a YAML manifest + entry-point system. New endpoints, dataset formats, exporters, accuracy graders, plot types all go through the