Article: AMD's Mobile Strategy
By: Exophase (exophase.delete@this.gmail.com), December 22, 2011 7:17 pm
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 12/22/11 wrote:
---------------------------
>Also if we are discussing ARM and SSE behavior, wouldn't we want to think about Neon (and perhaps AVX)?
>
>David
That'd be a pretty sprawling and I'm sure heated new discussion..
If we want to keep it to just load/store behavior: NEON in ARMv7a can load or store 1-4 64-bit registers. Loads can perform de-interleaving and you can load single elements (lanes in NEON parlance) instead of entire vectors. The latter can also broadcast to all lanes of the destination register. Stores can perform interleaving and you can store individual lanes. But there's no broadcasting. Not that you'd want it very often.
The 1-4 registers can't be chosen freely, but sometimes there's spacing options letting you do every other register. This is to accommodate interleaving elements from distinct 128-bit vectors (instead of 64-bit vectors).
What I don't really know is how ARM64 changes this. I just know that it has to in some way because 128-bit registers no longer alias to two 64-bit registers.
I don't think AVX should be part of this discussion at this time because there's no suggestion that anyone will be putting it in a low power x86 chip. Atom and Bobcat are both stuck at SSSE3. Nano 3000 did manage to add SSE4.1 support.
I can't speak for AMD, but Intel seems to want to make AVX a luxury feature for some reason. So long as they're keeping it out of Pentiums and Celerons I don't see them adding it to Atoms. There it'd take substantially new die space instead of just being fused off. But I really don't understand Intel here. No one is choosing Core-iX because they can run AVX. But some programmers WILL choose not to bother with AVX at all with so many people being unable to run it.
---------------------------
>Also if we are discussing ARM and SSE behavior, wouldn't we want to think about Neon (and perhaps AVX)?
>
>David
That'd be a pretty sprawling and I'm sure heated new discussion..
If we want to keep it to just load/store behavior: NEON in ARMv7a can load or store 1-4 64-bit registers. Loads can perform de-interleaving and you can load single elements (lanes in NEON parlance) instead of entire vectors. The latter can also broadcast to all lanes of the destination register. Stores can perform interleaving and you can store individual lanes. But there's no broadcasting. Not that you'd want it very often.
The 1-4 registers can't be chosen freely, but sometimes there's spacing options letting you do every other register. This is to accommodate interleaving elements from distinct 128-bit vectors (instead of 64-bit vectors).
What I don't really know is how ARM64 changes this. I just know that it has to in some way because 128-bit registers no longer alias to two 64-bit registers.
I don't think AVX should be part of this discussion at this time because there's no suggestion that anyone will be putting it in a low power x86 chip. Atom and Bobcat are both stuck at SSSE3. Nano 3000 did manage to add SSE4.1 support.
I can't speak for AMD, but Intel seems to want to make AVX a luxury feature for some reason. So long as they're keeping it out of Pentiums and Celerons I don't see them adding it to Atoms. There it'd take substantially new die space instead of just being fused off. But I really don't understand Intel here. No one is choosing Core-iX because they can run AVX. But some programmers WILL choose not to bother with AVX at all with so many people being unable to run it.