rtl-area-timinglisted

Use when optimizing RTL microarchitecture for area or clock frequency (Fmax), a design is too big to fit or too slow to meet timing, a wide multiply or barrel shifter is the critical path, or a "compute everything and select" datapath is too large
Midstall/claude-for-hardware · ★ 1 · Data & Documents · score 74

Install: claude install-skill Midstall/claude-for-hardware

# RTL Area and Timing Optimization ## Overview Making RTL smaller or faster is a sequence of structural decisions, each justified by a measurement. The wins are rarely where intuition points: the giant is often a structure you didn't think of (a "ROM" that is really 94k flops), and the critical path is usually one specific primitive, not "logic depth" in general. **Core principle:** Diagnose with data, change one structure, re-measure. Optimize the actual critical path or the actual giant, and stop the moment it stops being the bottleneck. Guessing wastes builds and can place worse. ## When to Use - A design won't fit, or misses its timing constraint - A wide multiply, barrel shifter, or big mux is suspected of dominating - A microcoded or "compute all handlers and select" datapath is too large - You're about to "optimize" something without having read the reports This is the RTL-technique companion to `fpga-synthesis-fit` (the tool methodology for measuring). Measure there, transform here. ## Pipeline A Wide Multiply Internally A single-cycle NxN multiply (64x64) maps to DSP tiles plus a long partial-product carry chain, and that chain is usually the critical path. Registering only the multiply's OUTPUT does not break the internal carry chain; the operands-to-output path is still essentially the whole multiply. You must pipeline INTERNALLY: decompose into smaller products (four 32x32), register the partial products, then sum the shifted partials in a second register