Article: AMD's Mobile Strategy
By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 16, 2011 5:30 pm
Room: Moderated Discussions
Exophase (exophase@gmail.com) on 12/16/11 wrote:
>
>store: 2 uops (this is seriously the friggin silver bullet against your "uops =
>RISC" argument, please tell me the RISC uarch that takes two cycles for stores)
Why do you confuse "cycles" and "instructions". They have
nothing to do with each other.
Of your list, this is indeed the only one that is "unusual",
in that store does that special "address" and "data" uop.
That said, the others are not all that out-of-line for a
RISC setup, and even the "two uops for store" is mitigated
to some degree that the x86 addressing modes for the address
op often *are* equivalent to another RISC instruction.
>At the very least show the courtesy to cite SOMETHING when
>bringing up figures like your "1.2 to 1.7" one.
Like you've cited stuff?
Anyway, here is the source
http://tams-www.informatik.uni-hamburg.de/lehre/2001ss/proseminar/mikroprozessoren/papers/pentium-pro-performance.pdf
which is horribly formatted, but shows that 1.2 - 1.7 number
(average: 1.35). That's the PPro. Here is, for comparison,
a rather interesting POWER comparison with a more modern
Intel CPU (Woodcrest), which shows very close to 1.0:
http://lca.ece.utexas.edu/pubs/spec09_ciji.pdf
which is also interesting because their pathlengths are
actually very comparable with POWER. It is possible (in
fact likely) that at least part of that is simply
differences in compilers too, of course.
That other paper is also readable, because its whitespace
hasn't been destroyed by some horrible pdf import thing
(or whatever happened to the PPro paper - maybe somebody
can find a better version of that).
>Or if you later abstain on it have the common courtesy
>to admit you were too hasty.
So can you please shut up now? I've posted the citation
with the numbers I quoted. Do me the courtesy of just
admitting I wasn't making stuff up, but had actual papers
to back me up, and that the numbers I quoted were accurate.
And no, I do not have numbers for ARM instructions vs
x86 instructions, so that I could actually try to see
how well ARM does on Spec. I've never seen a single ARM
Spec run that looked at all trustworthy, sadly enough.
The POWER numbers are interesting for pathlength, in that
they are much closer to x86 than I would have expected.
bzip2 in particular stands out. It's not what older papers
usually show.
Anyway, the lack of anything worthwhile on ARM makes any
actual ARM comparisons hard. But maybe I've just not
found the right papers - all the academic ones tend to be
x86 vs the "big RISC"s.
I've brought that up before - the ARM people always talk
about totally useless crap like whetstone/dhrystone or the
totally random one that Wilco tends to quote that nobody
even knows what the benchmark is. The lack of actual data
on real benchmarks with real memory components is sad.
Btw, the reason I look for spec benchmarks is not because
I think they are particularly good - but at least they
aren't total crap. From personal experience, Spec is not
at all indicative of the kinds of loads you see on the
desktop, but at least it's a hell of a lot closer than
dhrystone.
Linus
>
>store: 2 uops (this is seriously the friggin silver bullet against your "uops =
>RISC" argument, please tell me the RISC uarch that takes two cycles for stores)
Why do you confuse "cycles" and "instructions". They have
nothing to do with each other.
Of your list, this is indeed the only one that is "unusual",
in that store does that special "address" and "data" uop.
That said, the others are not all that out-of-line for a
RISC setup, and even the "two uops for store" is mitigated
to some degree that the x86 addressing modes for the address
op often *are* equivalent to another RISC instruction.
>At the very least show the courtesy to cite SOMETHING when
>bringing up figures like your "1.2 to 1.7" one.
Like you've cited stuff?
Anyway, here is the source
http://tams-www.informatik.uni-hamburg.de/lehre/2001ss/proseminar/mikroprozessoren/papers/pentium-pro-performance.pdf
which is horribly formatted, but shows that 1.2 - 1.7 number
(average: 1.35). That's the PPro. Here is, for comparison,
a rather interesting POWER comparison with a more modern
Intel CPU (Woodcrest), which shows very close to 1.0:
http://lca.ece.utexas.edu/pubs/spec09_ciji.pdf
which is also interesting because their pathlengths are
actually very comparable with POWER. It is possible (in
fact likely) that at least part of that is simply
differences in compilers too, of course.
That other paper is also readable, because its whitespace
hasn't been destroyed by some horrible pdf import thing
(or whatever happened to the PPro paper - maybe somebody
can find a better version of that).
>Or if you later abstain on it have the common courtesy
>to admit you were too hasty.
So can you please shut up now? I've posted the citation
with the numbers I quoted. Do me the courtesy of just
admitting I wasn't making stuff up, but had actual papers
to back me up, and that the numbers I quoted were accurate.
And no, I do not have numbers for ARM instructions vs
x86 instructions, so that I could actually try to see
how well ARM does on Spec. I've never seen a single ARM
Spec run that looked at all trustworthy, sadly enough.
The POWER numbers are interesting for pathlength, in that
they are much closer to x86 than I would have expected.
bzip2 in particular stands out. It's not what older papers
usually show.
Anyway, the lack of anything worthwhile on ARM makes any
actual ARM comparisons hard. But maybe I've just not
found the right papers - all the academic ones tend to be
x86 vs the "big RISC"s.
I've brought that up before - the ARM people always talk
about totally useless crap like whetstone/dhrystone or the
totally random one that Wilco tends to quote that nobody
even knows what the benchmark is. The lack of actual data
on real benchmarks with real memory components is sad.
Btw, the reason I look for spec benchmarks is not because
I think they are particularly good - but at least they
aren't total crap. From personal experience, Spec is not
at all indicative of the kinds of loads you see on the
desktop, but at least it's a hell of a lot closer than
dhrystone.
Linus