Article: AMD's Mobile Strategy
By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 15, 2011 10:32 am
Room: Moderated Discussions
Wilco (Wilco.Dijkstra@ntlworld.com) on 12/15/11 wrote:
>
>I disagree. The impact on the high-end is smaller nowadays, eventhough it remains
>non-trivial. Nobody would claim that 4-way x86 decode is easy! It has taken a very
>long time for x86 to get there, when 3rd generation OoO ARM is already going to be 4-way.
That's a total red herring.
x86 instructions do more.
Doing a two-way x86 decode is not rocket science, and has
been done for a long time. And it's likely not all that
different from four-way ARM that is not even done yet.
Or look at the old-style Intel three-way instruction decoder
(3-1-1) that could decode three simple instructions.
No, generic 4-way x86 decode isn't simple, but it's a hell
of a lot more than 4 ARM instructions. So your comparison
simply makes no sense!
Those x86 addressing modes are powerful and used. And they
regularly replace two or more ARM instructions. Just
do the math: ARM code isn't actually all that much denser
even in Thumb, yet x86 instructions are rather longer on
average.
You can think of it this way: all those embedded constants
and addressing modes are all just "simple instructions".
On ARM, they are explicit instructions, on x86 they are
"microinstructions" embedded in a "macroinstruction".
So don't compare one ARM instruction to one x86 instruction.
They are very different. An ARM instruction is closer to
the old-style uops (and by "old-style" I mean the ones that
Intel used to produce that didn't have read-modify-write
versions: the uops in Core 2+ are rather closer to the
real x86 instructions).
The "uops per instructions" on x86 (again, older x86)
tended to be in the 1.2-1.7 range on spec, according to some
papers (again, they are now much closer to 1:1, but that's
because the modern Core2+ uops are actually more powerful
than ARM instructions are).
And that's not even taking things like constants into
account. Something that will only get worse for
ARM as it starts going 64-bit.
So comparing 4-way x86 to 4-way ARM is ridiculous.
Linus
>
>I disagree. The impact on the high-end is smaller nowadays, eventhough it remains
>non-trivial. Nobody would claim that 4-way x86 decode is easy! It has taken a very
>long time for x86 to get there, when 3rd generation OoO ARM is already going to be 4-way.
That's a total red herring.
x86 instructions do more.
Doing a two-way x86 decode is not rocket science, and has
been done for a long time. And it's likely not all that
different from four-way ARM that is not even done yet.
Or look at the old-style Intel three-way instruction decoder
(3-1-1) that could decode three simple instructions.
No, generic 4-way x86 decode isn't simple, but it's a hell
of a lot more than 4 ARM instructions. So your comparison
simply makes no sense!
Those x86 addressing modes are powerful and used. And they
regularly replace two or more ARM instructions. Just
do the math: ARM code isn't actually all that much denser
even in Thumb, yet x86 instructions are rather longer on
average.
You can think of it this way: all those embedded constants
and addressing modes are all just "simple instructions".
On ARM, they are explicit instructions, on x86 they are
"microinstructions" embedded in a "macroinstruction".
So don't compare one ARM instruction to one x86 instruction.
They are very different. An ARM instruction is closer to
the old-style uops (and by "old-style" I mean the ones that
Intel used to produce that didn't have read-modify-write
versions: the uops in Core 2+ are rather closer to the
real x86 instructions).
The "uops per instructions" on x86 (again, older x86)
tended to be in the 1.2-1.7 range on spec, according to some
papers (again, they are now much closer to 1:1, but that's
because the modern Core2+ uops are actually more powerful
than ARM instructions are).
And that's not even taking things like constants into
account. Something that will only get worse for
ARM as it starts going 64-bit.
So comparing 4-way x86 to 4-way ARM is ridiculous.
Linus