Article: AMD's Mobile Strategy
By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 22, 2011 10:56 am
Room: Moderated Discussions
Wilco (Wilco.Dijkstra@ntlworld.com) on 12/22/11 wrote:
>
>No, that's a smart way to use immediates. You can splat thousands of copies all
>over your executable or keep just one copy in a constant pool. Which is better,
>in terms of cache, tlb, i-fetch, decode overheads? What about relocation overheads?
For pretty much all of those it's better to put the
constants in the I$ close to the user, rather than putting
another constant in the I$ (the "constant offset")
that is a sizeable part of the constant you actually want.
And when you want a 32-bit constant for pc-relative
addressing (but not when you want said constant to store
it to memory, say, or any other use of bigger constants),
you are now magically ok with it being right there in the
instruction stream as another instruction, rather than
being loaded from your "superior" constant pool.
So it seems that you are ok with that when ARM does it,
right? Putting the constant in the instruction stream as
part of the instruction is stupid when x86 does it, but
when ARM does it by creating a special instruction pair
to build up the 32-bit pc-relative offset, it's then super
smart.
When x86 puts the constants into the code stream, it's just
stupid and wasteful, especially when it does it in a
generic manner.
Also, even duplication is generally better than "random
accesses in a big table" for caches. The #1 thing for
caches is not "size" - it's "locality". Sure, size matters,
but size matters less than locality.
But hey, we all know that you don't care about anything
like that. All you care about is "OMG, ARM is great, and
x86 sucks", and all arguments as to why are irrelevant.
Linus
>
>No, that's a smart way to use immediates. You can splat thousands of copies all
>over your executable or keep just one copy in a constant pool. Which is better,
>in terms of cache, tlb, i-fetch, decode overheads? What about relocation overheads?
For pretty much all of those it's better to put the
constants in the I$ close to the user, rather than putting
another constant in the I$ (the "constant offset")
that is a sizeable part of the constant you actually want.
And when you want a 32-bit constant for pc-relative
addressing (but not when you want said constant to store
it to memory, say, or any other use of bigger constants),
you are now magically ok with it being right there in the
instruction stream as another instruction, rather than
being loaded from your "superior" constant pool.
So it seems that you are ok with that when ARM does it,
right? Putting the constant in the instruction stream as
part of the instruction is stupid when x86 does it, but
when ARM does it by creating a special instruction pair
to build up the 32-bit pc-relative offset, it's then super
smart.
When x86 puts the constants into the code stream, it's just
stupid and wasteful, especially when it does it in a
generic manner.
Also, even duplication is generally better than "random
accesses in a big table" for caches. The #1 thing for
caches is not "size" - it's "locality". Sure, size matters,
but size matters less than locality.
But hey, we all know that you don't care about anything
like that. All you care about is "OMG, ARM is great, and
x86 sucks", and all arguments as to why are irrelevant.
Linus