Article: AMD's Mobile Strategy
By: Wilco (Wilco.Dijkstra.delete@this.ntlworld.com), December 15, 2011 11:10 pm
Room: Moderated Discussions
Linus Torvalds (torvalds@linux-foundation.org) on 12/15/11 wrote:
---------------------------
>Wilco (Wilco.Dijkstra@ntlworld.com) on 12/15/11 wrote:
>>
>>I disagree. The impact on the high-end is smaller nowadays, eventhough it remains
>>non-trivial. Nobody would claim that 4-way x86 decode is easy! It has taken a very
>>long time for x86 to get there, when 3rd generation OoO ARM is already going to be 4-way.
>
>That's a total red herring.
>
>x86 instructions do more.
No, on average they do less. Nobody cares what the most complex instructions can do if they never get used by the compiler. And x86 instructions are just not a good fit for the code that most people write.
>Doing a two-way x86 decode is not rocket science, and has
>been done for a long time. And it's likely not all that
>different from four-way ARM that is not even done yet.
>
>Or look at the old-style Intel three-way instruction decoder
>(3-1-1) that could decode three simple instructions.
>
>No, generic 4-way x86 decode isn't simple, but it's a hell
>of a lot more than 4 ARM instructions. So your comparison
>simply makes no sense!
Correction: a lot more complex.
>Those x86 addressing modes are powerful and used. And they
>regularly replace two or more ARM instructions. Just
>do the math: ARM code isn't actually all that much denser
>even in Thumb, yet x86 instructions are rather longer on
>average.
Show an example where an x86 addressing mode replaces 2 or more ARM instructions.
>You can think of it this way: all those embedded constants
>and addressing modes are all just "simple instructions".
>On ARM, they are explicit instructions, on x86 they are
>"microinstructions" embedded in a "macroinstruction".
>
>So don't compare one ARM instruction to one x86 instruction.
>They are very different. An ARM instruction is closer to
>the old-style uops (and by "old-style" I mean the ones that
>Intel used to produce that didn't have read-modify-write
>versions: the uops in Core 2+ are rather closer to the
>real x86 instructions).
That's not true at all. How many x86 micro ops do you need to execute this single ARM instruction:
addeq r0,r1,r2,lsl #2
>And that's not even taking things like constants into
>account. Something that will only get worse for
>ARM as it starts going 64-bit.
Really? Immediates are not really a problem on ARM, and 64-bit doesn't make it any worse. In fact, without even having seen the ISA or the ABI, I expect the way global variables and constants are dealt with is much improved over how it works today.
>So comparing 4-way x86 to 4-way ARM is ridiculous.
Indeed, ARM instructions are so much more powerful that you need 64-way x86 decode to keep up with 4-way ARM...
Seriously, I'd expected better from you.
Wilco
---------------------------
>Wilco (Wilco.Dijkstra@ntlworld.com) on 12/15/11 wrote:
>>
>>I disagree. The impact on the high-end is smaller nowadays, eventhough it remains
>>non-trivial. Nobody would claim that 4-way x86 decode is easy! It has taken a very
>>long time for x86 to get there, when 3rd generation OoO ARM is already going to be 4-way.
>
>That's a total red herring.
>
>x86 instructions do more.
No, on average they do less. Nobody cares what the most complex instructions can do if they never get used by the compiler. And x86 instructions are just not a good fit for the code that most people write.
>Doing a two-way x86 decode is not rocket science, and has
>been done for a long time. And it's likely not all that
>different from four-way ARM that is not even done yet.
>
>Or look at the old-style Intel three-way instruction decoder
>(3-1-1) that could decode three simple instructions.
>
>No, generic 4-way x86 decode isn't simple, but it's a hell
>of a lot more than 4 ARM instructions. So your comparison
>simply makes no sense!
Correction: a lot more complex.
>Those x86 addressing modes are powerful and used. And they
>regularly replace two or more ARM instructions. Just
>do the math: ARM code isn't actually all that much denser
>even in Thumb, yet x86 instructions are rather longer on
>average.
Show an example where an x86 addressing mode replaces 2 or more ARM instructions.
>You can think of it this way: all those embedded constants
>and addressing modes are all just "simple instructions".
>On ARM, they are explicit instructions, on x86 they are
>"microinstructions" embedded in a "macroinstruction".
>
>So don't compare one ARM instruction to one x86 instruction.
>They are very different. An ARM instruction is closer to
>the old-style uops (and by "old-style" I mean the ones that
>Intel used to produce that didn't have read-modify-write
>versions: the uops in Core 2+ are rather closer to the
>real x86 instructions).
That's not true at all. How many x86 micro ops do you need to execute this single ARM instruction:
addeq r0,r1,r2,lsl #2
>And that's not even taking things like constants into
>account. Something that will only get worse for
>ARM as it starts going 64-bit.
Really? Immediates are not really a problem on ARM, and 64-bit doesn't make it any worse. In fact, without even having seen the ISA or the ABI, I expect the way global variables and constants are dealt with is much improved over how it works today.
>So comparing 4-way x86 to 4-way ARM is ridiculous.
Indeed, ARM instructions are so much more powerful that you need 64-way x86 decode to keep up with 4-way ARM...
Seriously, I'd expected better from you.
Wilco