Article: AMD's Mobile Strategy
By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), December 17, 2011 11:33 am
Room: Moderated Discussions
Michael S (already5chosen@yahoo.com) on 12/17/11 wrote:
---------------------------
>Exophase (exophase@gmail.com) on 12/16/11 wrote:
>---------------------------
>>
>>store: 2 uops (this is seriously the friggin silver bullet against your "uops =
>>RISC" argument, please tell me the RISC uarch that takes two cycles for stores)
>
>Ignoring your stupid "2 uops=2 clocks" equivalence...
>Power4 "cracks" all integer stores that use reg+reg addressing. That is roughly
>equivalent to Intel's generation of 2 uOps, although less helpful for OoO scheduling.
>I didn't read sufficiently detailed docs on Power5/6/7 to know whether they crack
>stores in the same way. My personal guess: Power5 and Power7, Power6 don't.
>
>ARM Cortex-A9 is uArch is rather poorly documented, even relatively to Power, but
>it would surprise me if they don't "crack" integer stores with reg+reg addressing
>or don't issue them simultaneously through 2 issue ports, which is almost the same thing.
No, [reg, +reg] and [reg, +reg lsl #2] take a single AGU cycle like [reg, imm] addressing modes. More complex addressing modes do the shift in the integer ALU, then forward to the AGU (at least that is what the timings suggest).
See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/CIAECBEB.html
The A9 never cracks instructions, but some like shift+add do spend more than 1 cycle in an execution unit.
>When implementing integer store on OoO core, one should find a way around limitation
>of 2 GPR inputs per uOp and Intel's 2 uOps is just one of such ways, not the most
>economical, but certainly most flexible as far as further scheduling concerned.
The A9 has no such limit.
Wilco
---------------------------
>Exophase (exophase@gmail.com) on 12/16/11 wrote:
>---------------------------
>>
>>store: 2 uops (this is seriously the friggin silver bullet against your "uops =
>>RISC" argument, please tell me the RISC uarch that takes two cycles for stores)
>
>Ignoring your stupid "2 uops=2 clocks" equivalence...
>Power4 "cracks" all integer stores that use reg+reg addressing. That is roughly
>equivalent to Intel's generation of 2 uOps, although less helpful for OoO scheduling.
>I didn't read sufficiently detailed docs on Power5/6/7 to know whether they crack
>stores in the same way. My personal guess: Power5 and Power7, Power6 don't.
>
>ARM Cortex-A9 is uArch is rather poorly documented, even relatively to Power, but
>it would surprise me if they don't "crack" integer stores with reg+reg addressing
>or don't issue them simultaneously through 2 issue ports, which is almost the same thing.
No, [reg, +reg] and [reg, +reg lsl #2] take a single AGU cycle like [reg, imm] addressing modes. More complex addressing modes do the shift in the integer ALU, then forward to the AGU (at least that is what the timings suggest).
See http://infocenter.arm.com/help/index.jsp?topic=/com.arm.doc.ddi0388e/CIAECBEB.html
The A9 never cracks instructions, but some like shift+add do spend more than 1 cycle in an execution unit.
>When implementing integer store on OoO core, one should find a way around limitation
>of 2 GPR inputs per uOp and Intel's 2 uOps is just one of such ways, not the most
>economical, but certainly most flexible as far as further scheduling concerned.
The A9 has no such limit.
Wilco