Article: AMD's Mobile Strategy
By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), December 17, 2011 5:45 pm
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 12/17/11 wrote:
---------------------------
>Wilco (Wilco.Dijkstra@ntlworld.com) on 12/17/11 wrote:
>---------------------------
>>Linus Torvalds (torvalds@linux-foundation.org) on 12/16/11 wrote:
>>---------------------------
>>
>>>Here is, for comparison,
>>>a rather interesting POWER comparison with a more modern
>>>Intel CPU (Woodcrest), which shows very close to 1.0:
>>>
>>>http://lca.ece.utexas.edu/pubs/spec09_ciji.pdf
>>>>
>>>
>>>which is also interesting because their pathlengths are
>>>actually very comparable with POWER. It is possible (in
>>>fact likely) that at least part of that is simply
>>>differences in compilers too, of course.
>>
>>Very interesting paper indeed! So POWER and x86 have the same instruction counts
>>on average across Spec when going all out for performance. It's even more compelling
>>due to using very different compilers, so you can't blame >it on using identical code generation strategies.
>
>>I can certainly try to get numbers to confirm this for ARM >as well, but I think
>>the paper says it all: RISC and CISC nowadays have very >similar instruction complexity
>>- in part because CISCs stopped using complex instructions >and RISC instructions became more complex.
>
>Did you even read that paper? That's not even remotely what it says. First, they
>were using 64-bit mode for some of the benchmarks, which disables a number of optimizations.
Actually they chose 64-bit mode when it ran faster than 32-bit mode, which was the case for most of the benchmarks. What optimizations are disabled in 64-bit mode? They used the same compiler options.
>Second, if you look at GCC, nearly 30% of the instructions are 'complex' (i.e.
>using uop fusion or macro-op fusion). The integer average is 20%. On top of that
>the ESP tracker seems to yield around a 5-6% benefit for the more complex integer benchmarks.
Why does this matter? All those complex instructions, and yet it ends up with just 8% fewer instructions executed than POWER.
>>So can we now agree 1 ARM decoder = 1 x86 decoder?
>
>Again, did you even read that paper?
Did you? You clearly missed this:
"The path length ratio is defined
as the ratio of the instructions retired by POWER5+ to the number of instructions
retired by Woodcrest. The path length ratio (instruction count ratio) ranges
from 0.7 to 1.23 for integer programs and 0.73 to 1.83 for floating-point programs.
The lack of bias is evident since the geometric mean is about 1 for both integer and
floating-point applications."
So while there are some outliers, the average number of executed instructions over spec is the same for POWER and x86. Ie. they have the same semantic complexity.
Wilco
---------------------------
>Wilco (Wilco.Dijkstra@ntlworld.com) on 12/17/11 wrote:
>---------------------------
>>Linus Torvalds (torvalds@linux-foundation.org) on 12/16/11 wrote:
>>---------------------------
>>
>>>Here is, for comparison,
>>>a rather interesting POWER comparison with a more modern
>>>Intel CPU (Woodcrest), which shows very close to 1.0:
>>>
>>>http://lca.ece.utexas.edu/pubs/spec09_ciji.pdf
>>>>
>>>
>>>which is also interesting because their pathlengths are
>>>actually very comparable with POWER. It is possible (in
>>>fact likely) that at least part of that is simply
>>>differences in compilers too, of course.
>>
>>Very interesting paper indeed! So POWER and x86 have the same instruction counts
>>on average across Spec when going all out for performance. It's even more compelling
>>due to using very different compilers, so you can't blame >it on using identical code generation strategies.
>
>>I can certainly try to get numbers to confirm this for ARM >as well, but I think
>>the paper says it all: RISC and CISC nowadays have very >similar instruction complexity
>>- in part because CISCs stopped using complex instructions >and RISC instructions became more complex.
>
>Did you even read that paper? That's not even remotely what it says. First, they
>were using 64-bit mode for some of the benchmarks, which disables a number of optimizations.
Actually they chose 64-bit mode when it ran faster than 32-bit mode, which was the case for most of the benchmarks. What optimizations are disabled in 64-bit mode? They used the same compiler options.
>Second, if you look at GCC, nearly 30% of the instructions are 'complex' (i.e.
>using uop fusion or macro-op fusion). The integer average is 20%. On top of that
>the ESP tracker seems to yield around a 5-6% benefit for the more complex integer benchmarks.
Why does this matter? All those complex instructions, and yet it ends up with just 8% fewer instructions executed than POWER.
>>So can we now agree 1 ARM decoder = 1 x86 decoder?
>
>Again, did you even read that paper?
Did you? You clearly missed this:
"The path length ratio is defined
as the ratio of the instructions retired by POWER5+ to the number of instructions
retired by Woodcrest. The path length ratio (instruction count ratio) ranges
from 0.7 to 1.23 for integer programs and 0.73 to 1.83 for floating-point programs.
The lack of bias is evident since the geometric mean is about 1 for both integer and
floating-point applications."
So while there are some outliers, the average number of executed instructions over spec is the same for POWER and x86. Ie. they have the same semantic complexity.
Wilco