Article: AMD's Mobile Strategy
By: Michael S (already5chosen.delete@this.yahoo.com), December 17, 2011 10:46 am
Room: Moderated Discussions
Exophase (exophase@gmail.com) on 12/16/11 wrote:
---------------------------
>
>store: 2 uops (this is seriously the friggin silver bullet against your "uops =
>RISC" argument, please tell me the RISC uarch that takes two cycles for stores)
Ignoring your stupid "2 uops=2 clocks" equivalence...
Power4 "cracks" all integer stores that use reg+reg addressing. That is roughly equivalent to Intel's generation of 2 uOps, although less helpful for OoO scheduling.
I didn't read sufficiently detailed docs on Power5/6/7 to know whether they crack stores in the same way. My personal guess: Power5 and Power7, Power6 don't.
ARM Cortex-A9 is uArch is rather poorly documented, even relatively to Power, but it would surprise me if they don't "crack" integer stores with reg+reg addressing or don't issue them simultaneously through 2 issue ports, which is almost the same thing.
When implementing integer store on OoO core, one should find a way around limitation of 2 GPR inputs per uOp and Intel's 2 uOps is just one of such ways, not the most economical, but certainly most flexible as far as further scheduling concerned.
---------------------------
>
>store: 2 uops (this is seriously the friggin silver bullet against your "uops =
>RISC" argument, please tell me the RISC uarch that takes two cycles for stores)
Ignoring your stupid "2 uops=2 clocks" equivalence...
Power4 "cracks" all integer stores that use reg+reg addressing. That is roughly equivalent to Intel's generation of 2 uOps, although less helpful for OoO scheduling.
I didn't read sufficiently detailed docs on Power5/6/7 to know whether they crack stores in the same way. My personal guess: Power5 and Power7, Power6 don't.
ARM Cortex-A9 is uArch is rather poorly documented, even relatively to Power, but it would surprise me if they don't "crack" integer stores with reg+reg addressing or don't issue them simultaneously through 2 issue ports, which is almost the same thing.
When implementing integer store on OoO core, one should find a way around limitation of 2 GPR inputs per uOp and Intel's 2 uOps is just one of such ways, not the most economical, but certainly most flexible as far as further scheduling concerned.