By: -.- (blarg.delete@this.mailinator.com), August 30, 2021 5:47 pm
Room: Moderated Discussions
Chester (lamchester.delete@this.gmail.com) on August 30, 2021 1:03 pm wrote:
> In BD's case, that's probably to hit high clock speeds on a pretty bad node. Integer SIMD
> ops are 1c latency on newer AMD CPUs, but probably 2c in Bulldozer because the units are
> half width. Piledriver could do a couple FPU ops (extrq, insertq) with 1c latency.
That was mostly in response to BD's high latency FPU claim.
The 2 cycle latency applies to 64b, 128b and 256b ops, so it's not solely due to 128b FPUs.
Agner's tables list the SSE4A ops as 1 cycle latency on PD, but InstLat measures it at 3 cycles for Excavator, so I'm not sold on it being 1 cycle latency.
> In BD's case, that's probably to hit high clock speeds on a pretty bad node. Integer SIMD
> ops are 1c latency on newer AMD CPUs, but probably 2c in Bulldozer because the units are
> half width. Piledriver could do a couple FPU ops (extrq, insertq) with 1c latency.
That was mostly in response to BD's high latency FPU claim.
The 2 cycle latency applies to 64b, 128b and 256b ops, so it's not solely due to 128b FPUs.
Agner's tables list the SSE4A ops as 1 cycle latency on PD, but InstLat measures it at 3 cycles for Excavator, so I'm not sold on it being 1 cycle latency.