Article: AMD's Mobile Strategy
By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 17, 2011 3:30 pm
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 12/17/11 wrote:
>
>Did you even read that paper? That's not even remotely what it says. First, they
>were using 64-bit mode for some of the benchmarks, which disables a number of optimizations.
Only on the chip they tested with, though. It would be
fairly interesting to see what Sandybridge does, since it
can do uop fusion with 64-bit compares.
They did say that without the fusion, it would be around
1.2 uops per insn iirc, so it's not like uops are really
1:1 with x86 instructions even now, despite much more
poerful uops.
However, and I think this is an important issue, the
interesting part is how the path length in instructions were
quite comparable between POWER and x86. That is something
that is independent of the fusion issue, although it does
depend on how you count instructions.
What I expect them to have done is to just counted the
number of retired instructions as the path length, which
is quite reasonable even if it can give odd results for
the string instructions, for example.
And that would count complex addressing modes etc as part
of the instruction (and not count fusion, afaik), and quite
frankly, my gut feel would have been to expect x86 to have
a lower path length than POWER.
And it was on some benchmarks, but not on others.
The path length really *is* interesting, because in many
ways that should correlate with how "powerful" each
instruction is (assuming the benchmark isn't timed, but
based on a certain amount of fixed work - which is true for
spec). Of course, things like ABI etc also do
affect it, along with compiler approaches and libraries,
so it's not entirely black-and-white, but it's certainly
a good first-order guess.
>Second, if you look at GCC, nearly 30% of the instructions are 'complex' (i.e.
>using uop fusion or macro-op fusion).
Note that the paper did not talk about the kind of
instruction complexity we have talked about (ie doing
multiple things in one instruction, like address generation,
memops and operations). And while the uop counts may show
some of that, the fact that the uops themselves have gotten
so much more powerful over the years kind of hides the
issue.
Linus
>
>Did you even read that paper? That's not even remotely what it says. First, they
>were using 64-bit mode for some of the benchmarks, which disables a number of optimizations.
Only on the chip they tested with, though. It would be
fairly interesting to see what Sandybridge does, since it
can do uop fusion with 64-bit compares.
They did say that without the fusion, it would be around
1.2 uops per insn iirc, so it's not like uops are really
1:1 with x86 instructions even now, despite much more
poerful uops.
However, and I think this is an important issue, the
interesting part is how the path length in instructions were
quite comparable between POWER and x86. That is something
that is independent of the fusion issue, although it does
depend on how you count instructions.
What I expect them to have done is to just counted the
number of retired instructions as the path length, which
is quite reasonable even if it can give odd results for
the string instructions, for example.
And that would count complex addressing modes etc as part
of the instruction (and not count fusion, afaik), and quite
frankly, my gut feel would have been to expect x86 to have
a lower path length than POWER.
And it was on some benchmarks, but not on others.
The path length really *is* interesting, because in many
ways that should correlate with how "powerful" each
instruction is (assuming the benchmark isn't timed, but
based on a certain amount of fixed work - which is true for
spec). Of course, things like ABI etc also do
affect it, along with compiler approaches and libraries,
so it's not entirely black-and-white, but it's certainly
a good first-order guess.
>Second, if you look at GCC, nearly 30% of the instructions are 'complex' (i.e.
>using uop fusion or macro-op fusion).
Note that the paper did not talk about the kind of
instruction complexity we have talked about (ie doing
multiple things in one instruction, like address generation,
memops and operations). And while the uop counts may show
some of that, the fact that the uops themselves have gotten
so much more powerful over the years kind of hides the
issue.
Linus