The peak performance specs of NVIDIA’s H100 SXM are only 25% higher than NVIDIA’s H100 PCIe even though the SXM version uses 700W versus 350W for the PCIe version. The SXM version has faster HBM DRAM (3 TB/s vs 2 TB/s) and faster NVLink (900 GB/s vs 600 GB/s) but that isn’t enough to explain the power difference.

Why would NVIDIA double the thermal design power to get only 25% more peak performance?

The only explanation I can think of is that the SXM version has much more than 25% more sustained performance than the PCIe version. Can anyone think of a different explanation?

What is a meaningful way for vendors to specify the performance of power-limited chips for each operation type (such as FP64 matrix operations)? Should vendors provide both a peak spec and a sustained spec or is there a better method? Under what conditions should the sustained spec be measured (HBM active, PCIe active, NVLink active, ...)? Should the sustained spec for each operation type have a minimum and a maximum value? benchmarks are difficult to interpret and do not reveal the sustained performance of each operation type.

The H100 datasheet is here:
