Article: AMD's Mobile Strategy
By: Exophase (exophase.delete@this.gmail.com), December 20, 2011 7:54 pm
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 12/20/11 wrote:
---------------------------
>You missed one other trick that x86 has, which is the stack engine. Based on the
>UT paper it has a small, but noticeable impact. IIRC, the paper mentioned about
>4% on average, but it was higher for the more complex integer benchmarks.
I'm actually not sure how they're composed wrt uop fusion; push reg is broken into two uops and could be viewed as load + op, only it's independent. pop reg goes to only one uop. So these are pretty specialized, but ARM has instructions that do the same thing (historically as one execution cycle)
Not sure what they would have as an answer to "stack engines" but it'd seem ARM64 implementations could do the same thing but more easily since SP is no longer a GPR.
>How is LEA handled in most designs? Is it single uop?
It tends to depend on how complex it is. Agner Fog's lists say it breaks down from one fused uop to three uops for simple addressing but I'd expect that to be in error, especially since for Nehalem it's just two. Complex is two uops but with a latency of three cycles somehow (and a throughput of just one per cycle). The performance is really not that great vs AGU, maybe because of forwarding paths that have to be taken.
Complex lea probably includes reg + reg + imm.
>It's also not clear how many of those advantages will exist for ARMv8/ARM64 (e.g. shifting).
>
A quite comprehensive ARM64 spec is available on ARM's site and I posted about the major differences between it and x86 earlier in this thread. Folded shifts are still present. Predication is more limited and load/store has been reduced from multiple to pairs, but it has a number of new features too.
>I don't think it's reasonable to compare "Atom cores today" with "A15 cores that don't yet exist".
>
Fine, wait a few months. What I'm actually comparing are Atom cores that aren't out yet (Cedar Trail) vs A15 based SoCs that aren't out yet. Comparisons are made based on claims made by the vendor.
>Yeah, I think it will also help to see a new uarch from Intel. That may give a
>better idea of what's feasible, because I think Intel has avoided uarch changes to focus on system level issues.
>
>David
I agree. I also hope that the rumors S|A gave about the next uarch improving IPC only 20-30% is either false or overly conservative.
---------------------------
>You missed one other trick that x86 has, which is the stack engine. Based on the
>UT paper it has a small, but noticeable impact. IIRC, the paper mentioned about
>4% on average, but it was higher for the more complex integer benchmarks.
I'm actually not sure how they're composed wrt uop fusion; push reg is broken into two uops and could be viewed as load + op, only it's independent. pop reg goes to only one uop. So these are pretty specialized, but ARM has instructions that do the same thing (historically as one execution cycle)
Not sure what they would have as an answer to "stack engines" but it'd seem ARM64 implementations could do the same thing but more easily since SP is no longer a GPR.
>How is LEA handled in most designs? Is it single uop?
It tends to depend on how complex it is. Agner Fog's lists say it breaks down from one fused uop to three uops for simple addressing but I'd expect that to be in error, especially since for Nehalem it's just two. Complex is two uops but with a latency of three cycles somehow (and a throughput of just one per cycle). The performance is really not that great vs AGU, maybe because of forwarding paths that have to be taken.
Complex lea probably includes reg + reg + imm.
>It's also not clear how many of those advantages will exist for ARMv8/ARM64 (e.g. shifting).
>
A quite comprehensive ARM64 spec is available on ARM's site and I posted about the major differences between it and x86 earlier in this thread. Folded shifts are still present. Predication is more limited and load/store has been reduced from multiple to pairs, but it has a number of new features too.
>I don't think it's reasonable to compare "Atom cores today" with "A15 cores that don't yet exist".
>
Fine, wait a few months. What I'm actually comparing are Atom cores that aren't out yet (Cedar Trail) vs A15 based SoCs that aren't out yet. Comparisons are made based on claims made by the vendor.
>Yeah, I think it will also help to see a new uarch from Intel. That may give a
>better idea of what's feasible, because I think Intel has avoided uarch changes to focus on system level issues.
>
>David
I agree. I also hope that the rumors S|A gave about the next uarch improving IPC only 20-30% is either false or overly conservative.