Article: AMD's Mobile Strategy
By: gallier2 (gallier2.delete@this.gmx.de), December 22, 2011 1:25 am
Room: Moderated Discussions
Seni (seniike@hotmail.com) on 12/21/11 wrote:
---------------------------
>>There is simply no excuse to use 5 instructions for a 64-bit address constant,
>
>Isn't it faster to do 5 ALU ops instead of one Load?
>
>Loading a constant with ALU ops has no dependence on anything so can always be
>pulled way ahead by the OOO engine. It doesn't really cost anything except fetch/decode bandwidth.
It costs more Icache misses and it costs on opportunity optimizations (more spills) because of the higher pressure on the registers. Granted the problem was mainly in the gcc 3 for Sparc. gcc 4 makes a much better job and uses only 3 instructions to load the 64 bit constant.
>
>But if you load a constant from a constant pool using a Load instruction then your
>OOO powers are limited by memory-ordering semantics, and it is likely that that
>load latency will become part of the program's critical path.
>
---------------------------
>>There is simply no excuse to use 5 instructions for a 64-bit address constant,
>
>Isn't it faster to do 5 ALU ops instead of one Load?
>
>Loading a constant with ALU ops has no dependence on anything so can always be
>pulled way ahead by the OOO engine. It doesn't really cost anything except fetch/decode bandwidth.
It costs more Icache misses and it costs on opportunity optimizations (more spills) because of the higher pressure on the registers. Granted the problem was mainly in the gcc 3 for Sparc. gcc 4 makes a much better job and uses only 3 instructions to load the 64 bit constant.
>
>But if you load a constant from a constant pool using a Load instruction then your
>OOO powers are limited by memory-ordering semantics, and it is likely that that
>load latency will become part of the program's critical path.
>