Article: AMD's Mobile Strategy
By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 16, 2011 10:14 am
Room: Moderated Discussions
Wilco (Wilco.Dijkstra@ntlworld.com) on 12/16/11 wrote:
>
>No, on average they do less.
Bullshit.
You are probably comparing some server-optimized P4 binary
(back in the days when compilers were explicitly trying to
avoid the CISC nature of x86 because the P4 sucked) against
ARM code.
ARM compilers have always felt that density is a big deal,
so they don't waste instructions the way x86 compilers do.
You can't compare things that way. When ARM starts to try
to compete on a performance level, they'll do sequences that
are better for OoO too.
And you need to use modern x86 compilers and settings, that
actually use the addressing modes and the load-op insns,
instead of trying to use x86 like a RISC machine like
people did ten years ago.
>And x86 instructions are just not a good fit for the code
>that most people write.
That's just not true, and you're just making things up.
We know how you dislike x86. It's irrational.
>Indeed, ARM instructions are so much more powerful that you need 64-way x86 decode to keep up with 4-way ARM...
"The crazy is strong in this one".
Yes, you can pick random "load/store multiple" ARM
instructions and make it look good for a instruction decode
standpoint, or pick something where some random instruction
does exactly what you want (the crazy "find first bit" vs
"count leading zeroes" discussion elsewhere).
And I can pick one: "rep movsb". It's memcpy() in a single
instruction decode. It's historically really slow, but it's
getting to the point where it's the fastest way to do
memory copies.
(It's called "Enhanced fast string instructions" for what
its worth. These days you should use "rep movsq", but that
will change).
I seriously suspect that you started hating x86 back when
people used segments and it had 16 bits and stuff. And you
can't get over it.
Linus
>
>No, on average they do less.
Bullshit.
You are probably comparing some server-optimized P4 binary
(back in the days when compilers were explicitly trying to
avoid the CISC nature of x86 because the P4 sucked) against
ARM code.
ARM compilers have always felt that density is a big deal,
so they don't waste instructions the way x86 compilers do.
You can't compare things that way. When ARM starts to try
to compete on a performance level, they'll do sequences that
are better for OoO too.
And you need to use modern x86 compilers and settings, that
actually use the addressing modes and the load-op insns,
instead of trying to use x86 like a RISC machine like
people did ten years ago.
>And x86 instructions are just not a good fit for the code
>that most people write.
That's just not true, and you're just making things up.
We know how you dislike x86. It's irrational.
>Indeed, ARM instructions are so much more powerful that you need 64-way x86 decode to keep up with 4-way ARM...
"The crazy is strong in this one".
Yes, you can pick random "load/store multiple" ARM
instructions and make it look good for a instruction decode
standpoint, or pick something where some random instruction
does exactly what you want (the crazy "find first bit" vs
"count leading zeroes" discussion elsewhere).
And I can pick one: "rep movsb". It's memcpy() in a single
instruction decode. It's historically really slow, but it's
getting to the point where it's the fastest way to do
memory copies.
(It's called "Enhanced fast string instructions" for what
its worth. These days you should use "rep movsq", but that
will change).
I seriously suspect that you started hating x86 back when
people used segments and it had 16 bits and stuff. And you
can't get over it.
Linus