Article: AMD's Mobile Strategy
By: Seni (seniike.delete@this.hotmail.com), December 21, 2011 10:06 am
Room: Moderated Discussions
Exophase (exophase@gmail.com) on 12/21/11 wrote:
>Generally only imm32 (or imm8) is available. The only x86-64 instructions available
>with 64-bit displacements are absolute loads or stores to al/ax/eax/rax. Generally
I'll have to double-check that. If true, it's a big letdown.
>>The x86 version combines not only the AGU op and Load, but also up to 1 ALU op,
>>and the loading and adding in of a full-length immediate.
>
>I actually think that store immediate is one of the more useful instructions that x86 has over ARM.
Strange. It might be common but I doubt it has much impact, as its performance would be barely different from the 2-instruction equivalent.
>>So for example, the x86-64 instruction
>>ADD RAX, [RBX + RSI + imm64]
>Well yeah, if this x86 instruction existed.
I really should check these things.
The 32-bit version exists though, and it would have a 4-instruction ARM equivalent.
If you need a separate MOV RAX, imm64 then the x86 version takes 2 instructions do the work of 6, which is still pretty compact.
>LDM/STM was the only big multi-op instruction. ARM64 removes it but instead has
>load/store pair which is a decent compromise for saving instructions for register
>save/restore. This is also consistent with ARM's last few uarch decisions, where
>ldm/stm had 2x the peak bandwidth to L1 compared to ldr/str.. they probably want
>to still provide for this sort of direct utilization.
Ok, I took a look at LDM/STM and LDP/STP.
They seem more like a vector op than a series of separate memory accesses. You're loading or storing a large contiguous block from a single address. So, on second thought, I can't really consider it a big multi-op instruction at all, since number of operations going on is one.
Routing a single wide load into multiple registers is still a cool trick though.
>Generally only imm32 (or imm8) is available. The only x86-64 instructions available
>with 64-bit displacements are absolute loads or stores to al/ax/eax/rax. Generally
I'll have to double-check that. If true, it's a big letdown.
>>The x86 version combines not only the AGU op and Load, but also up to 1 ALU op,
>>and the loading and adding in of a full-length immediate.
>
>I actually think that store immediate is one of the more useful instructions that x86 has over ARM.
Strange. It might be common but I doubt it has much impact, as its performance would be barely different from the 2-instruction equivalent.
>>So for example, the x86-64 instruction
>>ADD RAX, [RBX + RSI + imm64]
>Well yeah, if this x86 instruction existed.
I really should check these things.
The 32-bit version exists though, and it would have a 4-instruction ARM equivalent.
If you need a separate MOV RAX, imm64 then the x86 version takes 2 instructions do the work of 6, which is still pretty compact.
>LDM/STM was the only big multi-op instruction. ARM64 removes it but instead has
>load/store pair which is a decent compromise for saving instructions for register
>save/restore. This is also consistent with ARM's last few uarch decisions, where
>ldm/stm had 2x the peak bandwidth to L1 compared to ldr/str.. they probably want
>to still provide for this sort of direct utilization.
Ok, I took a look at LDM/STM and LDP/STP.
They seem more like a vector op than a series of separate memory accesses. You're loading or storing a large contiguous block from a single address. So, on second thought, I can't really consider it a big multi-op instruction at all, since number of operations going on is one.
Routing a single wide load into multiple registers is still a cool trick though.