Article: AMD's Mobile Strategy
By: anon (anon.delete@this.anon.com), December 21, 2011 9:12 am
Room: Moderated Discussions
Linus Torvalds (torvalds@linux-foundation.org) on 12/17/11 wrote:
---------------------------
>Wilco (Wilco.Dijkstra@ntlworld.com) on 12/17/11 wrote:
>>
>>So can we now agree 1 ARM decoder = 1 x86 decoder?
>
>Get the numbers, and I will be a lot more convinced. As
>mentioned, I've never seen any numbers for ARM that are
>at all worthwhile. So I don't know how it will actually
>compare to Power.
Hey at least we know that ARM is more POWERful as an ISA than Power, so we actually do know how it will actually compare to x86 when targeting high performance code generation: Very favorably.
>ARM64 probably compares favorably on a pathlength side,
>but it's hard to say.
>
>In particular, optimizing for performance can often lengthen
>the instruction path size, if it improves CPI. Look at the
>bzip2 numbers that stand out on the path length issue: it's
>the one where x86 had a noticeably longer path length. But
>it's also the one where x86 had better CPI (in fact, the
>path length had a very clear inverse relationship: the small
>star above the bars means "x86 had lower CPI", and it's
>correlated 100% when x86 had a longer path length).
>
>Sure, it could be architectural ("x86 simply needs more
>instructions for those benchmarks" - due to spills or
>whatever) but it could also be things like "the compiler
>generated 'wasteful' code because it generates lots of
>software speculation". Software speculation results in more
>instructions, but avoids branch mispredicts and can improve
>performance.
>
>IOW, we just don't know. But I agree that that paper implies
>x86 doesn't have a instruction advantage. But in the end,
>I'd really like to see the numbers for an equivalent run
>(ie Spec for best performance). Because that's what really
>matters - apples-to-appled comparisons.
>
>The classic x86 CISC paper ("RISC vs CISC: a tale of two
>chips" - I think it's the same paper cited by the two
>papers here) that compares against Alpha has numbers of
>13-33% lower path length for x86. Of course, alpha is
>probably the worst case.
>
>Linus
---------------------------
>Wilco (Wilco.Dijkstra@ntlworld.com) on 12/17/11 wrote:
>>
>>So can we now agree 1 ARM decoder = 1 x86 decoder?
>
>Get the numbers, and I will be a lot more convinced. As
>mentioned, I've never seen any numbers for ARM that are
>at all worthwhile. So I don't know how it will actually
>compare to Power.
Hey at least we know that ARM is more POWERful as an ISA than Power, so we actually do know how it will actually compare to x86 when targeting high performance code generation: Very favorably.
>ARM64 probably compares favorably on a pathlength side,
>but it's hard to say.
>
>In particular, optimizing for performance can often lengthen
>the instruction path size, if it improves CPI. Look at the
>bzip2 numbers that stand out on the path length issue: it's
>the one where x86 had a noticeably longer path length. But
>it's also the one where x86 had better CPI (in fact, the
>path length had a very clear inverse relationship: the small
>star above the bars means "x86 had lower CPI", and it's
>correlated 100% when x86 had a longer path length).
>
>Sure, it could be architectural ("x86 simply needs more
>instructions for those benchmarks" - due to spills or
>whatever) but it could also be things like "the compiler
>generated 'wasteful' code because it generates lots of
>software speculation". Software speculation results in more
>instructions, but avoids branch mispredicts and can improve
>performance.
>
>IOW, we just don't know. But I agree that that paper implies
>x86 doesn't have a instruction advantage. But in the end,
>I'd really like to see the numbers for an equivalent run
>(ie Spec for best performance). Because that's what really
>matters - apples-to-appled comparisons.
>
>The classic x86 CISC paper ("RISC vs CISC: a tale of two
>chips" - I think it's the same paper cited by the two
>papers here) that compares against Alpha has numbers of
>13-33% lower path length for x86. Of course, alpha is
>probably the worst case.
>
>Linus