Article: AMD's Mobile Strategy
By: David Kanter (dkanter.delete@this.realworldtech.com), December 22, 2011 5:29 pm
Room: Moderated Discussions
Wilco (Wilco.Dijkstra@ntlworld.com) on 12/22/11 wrote:
>>An SSE MOV loads from four consecutive addresses of floats >>at a time but it's still one load.
>>From the cache's point of view, the TLB's point of view, >>etc. consecutive addresses = one op.
>
>That's true, but don't those SSE loads read 1 register and >execute in a single
>cycle?
Not necessarily. Both Intel and AMD have 'cracked' 128-bit instructions into two uops (e.g. P4, K8, Bobcat, early Pentium M).
>I don't think you can split them into 4 individual >loads, >can you? An ldm
>loads multiple registers over several cycles (typically 2 >registers per cycle since
>ARM11) so it is not exactly like an SSE load.
They are subtly different.
ARM uses two architectural registers. Most x86's are renamed and if you do 64-bit renamed regs, you will end up using 2 renamed registers (e.g. Bobcat). So from a 'where the bits are' perspective, they are the same.
The exception handling and memory may be different as well.
And of course for newer x86 designs, the registers are in fact 128b (or 256b) wide...so you will only use 1 register.
Also if we are discussing ARM and SSE behavior, wouldn't we want to think about Neon (and perhaps AVX)?
David
>>An SSE MOV loads from four consecutive addresses of floats >>at a time but it's still one load.
>>From the cache's point of view, the TLB's point of view, >>etc. consecutive addresses = one op.
>
>That's true, but don't those SSE loads read 1 register and >execute in a single
>cycle?
Not necessarily. Both Intel and AMD have 'cracked' 128-bit instructions into two uops (e.g. P4, K8, Bobcat, early Pentium M).
>I don't think you can split them into 4 individual >loads, >can you? An ldm
>loads multiple registers over several cycles (typically 2 >registers per cycle since
>ARM11) so it is not exactly like an SSE load.
They are subtly different.
ARM uses two architectural registers. Most x86's are renamed and if you do 64-bit renamed regs, you will end up using 2 renamed registers (e.g. Bobcat). So from a 'where the bits are' perspective, they are the same.
The exception handling and memory may be different as well.
And of course for newer x86 designs, the registers are in fact 128b (or 256b) wide...so you will only use 1 register.
Also if we are discussing ARM and SSE behavior, wouldn't we want to think about Neon (and perhaps AVX)?
David