Article: AMD's Mobile Strategy
By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 17, 2011 11:31 am
Room: Moderated Discussions
Wilco (Wilco.Dijkstra@ntlworld.com) on 12/17/11 wrote:
>
>So can we now agree 1 ARM decoder = 1 x86 decoder?
Get the numbers, and I will be a lot more convinced. As
mentioned, I've never seen any numbers for ARM that are
at all worthwhile. So I don't know how it will actually
compare to Power. ARM64 probably compares favorably on a
pathlength side, but it's hard to say.
In particular, optimizing for performance can often lengthen
the instruction path size, if it improves CPI. Look at the
bzip2 numbers that stand out on the path length issue: it's
the one where x86 had a noticeably longer path length. But
it's also the one where x86 had better CPI (in fact, the
path length had a very clear inverse relationship: the small
star above the bars means "x86 had lower CPI", and it's
correlated 100% when x86 had a longer path length).
Sure, it could be architectural ("x86 simply needs more
instructions for those benchmarks" - due to spills or
whatever) but it could also be things like "the compiler
generated 'wasteful' code because it generates lots of
software speculation". Software speculation results in more
instructions, but avoids branch mispredicts and can improve
performance.
IOW, we just don't know. But I agree that that paper implies
x86 doesn't have a instruction advantage. But in the end,
I'd really like to see the numbers for an equivalent run
(ie Spec for best performance). Because that's what really
matters - apples-to-appled comparisons.
The classic x86 CISC paper ("RISC vs CISC: a tale of two
chips" - I think it's the same paper cited by the two
papers here) that compares against Alpha has numbers of
13-33% lower path length for x86. Of course, alpha is
probably the worst case.
Linus
>
>So can we now agree 1 ARM decoder = 1 x86 decoder?
Get the numbers, and I will be a lot more convinced. As
mentioned, I've never seen any numbers for ARM that are
at all worthwhile. So I don't know how it will actually
compare to Power. ARM64 probably compares favorably on a
pathlength side, but it's hard to say.
In particular, optimizing for performance can often lengthen
the instruction path size, if it improves CPI. Look at the
bzip2 numbers that stand out on the path length issue: it's
the one where x86 had a noticeably longer path length. But
it's also the one where x86 had better CPI (in fact, the
path length had a very clear inverse relationship: the small
star above the bars means "x86 had lower CPI", and it's
correlated 100% when x86 had a longer path length).
Sure, it could be architectural ("x86 simply needs more
instructions for those benchmarks" - due to spills or
whatever) but it could also be things like "the compiler
generated 'wasteful' code because it generates lots of
software speculation". Software speculation results in more
instructions, but avoids branch mispredicts and can improve
performance.
IOW, we just don't know. But I agree that that paper implies
x86 doesn't have a instruction advantage. But in the end,
I'd really like to see the numbers for an equivalent run
(ie Spec for best performance). Because that's what really
matters - apples-to-appled comparisons.
The classic x86 CISC paper ("RISC vs CISC: a tale of two
chips" - I think it's the same paper cited by the two
papers here) that compares against Alpha has numbers of
13-33% lower path length for x86. Of course, alpha is
probably the worst case.
Linus