Article: AMD's Mobile Strategy
By: David Kanter (dkanter.delete@this.realworldtech.com), December 22, 2011 8:23 pm
Room: Moderated Discussions
Exophase (exophase@gmail.com) on 12/22/11 wrote:
---------------------------
>David Kanter (dkanter@realworldtech.com) on 12/22/11 wrote:
>---------------------------
>>Also if we are discussing ARM and SSE behavior, wouldn't we want to think about Neon (and perhaps AVX)?
>>
>>David
>
>That'd be a pretty sprawling and I'm sure heated new >discussion..
I think it's tough to discuss only a subset of the vector extensions. It's true that AVX might be irrelevant for phones, today. But I doubt it will stay that way forever and you might see direct comparisons in tablets.
>If we want to keep it to just load/store behavior: NEON in ARMv7a can load or store
>1-4 64-bit registers. Loads can perform de-interleaving and you can load single
>elements (lanes in NEON parlance) instead of entire >vectors.
Do you mask out the lanes?
>The latter can also
>broadcast to all lanes of the destination register. Stores can perform interleaving
>and you can store individual lanes. But there's no broadcasting. Not that you'd want it very often.
>
>The 1-4 registers can't be chosen freely, but sometimes there's spacing options
>letting you do every other register. This is to >accommodate interleaving elements
>from distinct 128-bit vectors (instead of 64-bit vectors).
You're still using a single address right? It looks like they probably handle cache line crossing loads, which is nice.
>What I don't really know is how ARM64 changes this. I just know that it has to
>in some way because 128-bit registers no longer alias to >two 64-bit registers.
I seem to recall from talking with one of the architects at ARM that they really got rid of all >2 cycle load/stores. I think the main motivation was simplifying exceptions, pipeline control and consistency.
That's probably OK because ISTR they now have 128b registers to deal with double precision.
>I don't think AVX should be part of this discussion at this time because there's
>no suggestion that anyone will be putting it in a low power x86 chip. Atom and Bobcat
>are both stuck at SSSE3. Nano 3000 did manage to add >SSE4.1 support.
It's all a matter of degrees. AVX will eventually hit low power chips, it's a question of when. Also, you will see VFP/Neon chips that are hitting higher power levels. So I think the comparison is quite informative.
I'd liken it to Bobcat vs. Atom. They are really different designs, aimed at different markets. They overlap in some areas, thus there is room for comparisons. But we all know that Atom doesn't hit 20W and Bobcat doesn't go under 4.5W. So they each have 'unique' areas.
>I can't speak for AMD, but Intel seems to want to make AVX >a luxury feature for
>some reason. So long as they're keeping it out of Pentiums and Celerons I don't
>see them adding it to Atoms.
Brands don't matter much, price points do. There are SNBs available for around $100. I'd expect that to drop with IVB.
Besides, with Jen-Hsun Huang talking about how Kal-El is faster than Core2...it's not crazy to compare it against Sandy Bridge : P
>There it'd take substantially new die space instead
>of just being fused off. But I really don't understand >Intel here. No one is choosing
>Core-iX because they can run AVX. But some programmers >WILL choose not to bother
>with AVX at all with so many people being unable to run it.
It's unrealistic to expect ALL of Intel's products to have AVX 1 year after the first SNB. There are still folks who want less expensive parts.
David
---------------------------
>David Kanter (dkanter@realworldtech.com) on 12/22/11 wrote:
>---------------------------
>>Also if we are discussing ARM and SSE behavior, wouldn't we want to think about Neon (and perhaps AVX)?
>>
>>David
>
>That'd be a pretty sprawling and I'm sure heated new >discussion..
I think it's tough to discuss only a subset of the vector extensions. It's true that AVX might be irrelevant for phones, today. But I doubt it will stay that way forever and you might see direct comparisons in tablets.
>If we want to keep it to just load/store behavior: NEON in ARMv7a can load or store
>1-4 64-bit registers. Loads can perform de-interleaving and you can load single
>elements (lanes in NEON parlance) instead of entire >vectors.
Do you mask out the lanes?
>The latter can also
>broadcast to all lanes of the destination register. Stores can perform interleaving
>and you can store individual lanes. But there's no broadcasting. Not that you'd want it very often.
>
>The 1-4 registers can't be chosen freely, but sometimes there's spacing options
>letting you do every other register. This is to >accommodate interleaving elements
>from distinct 128-bit vectors (instead of 64-bit vectors).
You're still using a single address right? It looks like they probably handle cache line crossing loads, which is nice.
>What I don't really know is how ARM64 changes this. I just know that it has to
>in some way because 128-bit registers no longer alias to >two 64-bit registers.
I seem to recall from talking with one of the architects at ARM that they really got rid of all >2 cycle load/stores. I think the main motivation was simplifying exceptions, pipeline control and consistency.
That's probably OK because ISTR they now have 128b registers to deal with double precision.
>I don't think AVX should be part of this discussion at this time because there's
>no suggestion that anyone will be putting it in a low power x86 chip. Atom and Bobcat
>are both stuck at SSSE3. Nano 3000 did manage to add >SSE4.1 support.
It's all a matter of degrees. AVX will eventually hit low power chips, it's a question of when. Also, you will see VFP/Neon chips that are hitting higher power levels. So I think the comparison is quite informative.
I'd liken it to Bobcat vs. Atom. They are really different designs, aimed at different markets. They overlap in some areas, thus there is room for comparisons. But we all know that Atom doesn't hit 20W and Bobcat doesn't go under 4.5W. So they each have 'unique' areas.
>I can't speak for AMD, but Intel seems to want to make AVX >a luxury feature for
>some reason. So long as they're keeping it out of Pentiums and Celerons I don't
>see them adding it to Atoms.
Brands don't matter much, price points do. There are SNBs available for around $100. I'd expect that to drop with IVB.
Besides, with Jen-Hsun Huang talking about how Kal-El is faster than Core2...it's not crazy to compare it against Sandy Bridge : P
>There it'd take substantially new die space instead
>of just being fused off. But I really don't understand >Intel here. No one is choosing
>Core-iX because they can run AVX. But some programmers >WILL choose not to bother
>with AVX at all with so many people being unable to run it.
It's unrealistic to expect ALL of Intel's products to have AVX 1 year after the first SNB. There are still folks who want less expensive parts.
David