By: Maynard Handley (name99.delete@this.name99.org), May 15, 2013 3:08 pm
Room: Moderated Discussions
Ashraf Eassa (aeassa.delete@this.gmail.com) on May 15, 2013 11:59 am wrote:
> Hi everybody,
>
> I've been lurking for years, but the time has come when I would really love to pick the brains of
> the experts we have here. From my understanding, Atom is a much narrower design than Krait, Cortex
> A15 and others, and yet, in many benchmarks the older Saltwell core holds its own against even Krait
> in both FPU/INT, and against A15 in Linux integer benchmarks (but it gets decimated in FPU).
>
> So, my question is, how do I think about "Silvermont" competitive position against a fairly
> beefy modern ARM design such as the Cortex A15? From a high level perspective, it looks
> like on a per-clock basis it should be no contest - A15 is wider and more aggressive.
> But Intel is claiming that Silvermont is as fast as A15 on a per-clock basis.
>
> A couple of questions then:
>
> 1. How can a narrower design pull this off?
> 2. Is this likely to be integer only as A15 includes FMAC instructions,
> which in the right cases double FPU performance?
>
> Thank you all so much!
>
> Regards,
> Ashraf Eassa
There is VASTLY more to the performance of a CPU than the width of the superscalar pipeline. Issues of importance include
- the quality of the branch prediction (and the speed at which incorrect predictions are discovered and rectified)
- the flexibility of the pipeline (which includes things like in-order vs OoO, or how many instructions can be queued in various buffers --- eg size of ROB, size of load and store queues, number of rename registers)
- the memory pipeline (both on chip, so latency and bandwidth to L1 and L2, size of TLBs) and off-chip (so quality of the memory controller, width of the memory bus).
Compare, for example, back in the day the PPC 750 or 7400 with their Intel equivalents of the time. In theory they should have been about the same speed, with the PPC slightly ahead because its pipeline had better branch handling; in practice Intel was faster for almost all tasks because of the superior memory system at every level, from L1 all the way out to the DRAM.
One of the reasons Apple's Swift today feels superior to most of the competition, even with apparently lower specs, is again that Swift has a superior off-chip memory system. (Maybe also a superior on-chip memory system, but Apple has been secretive about that.)
There are MANY design points for a CPU, and none is obviously correct; especially in mobile where there remains an on-going argument about the relative importance of peak speed vs energy consumption. It's foolish at this point to assume that either Intel or the A15 designers made mistakes; more likely they have both produced remarkably good designs, and insisting that one is better than the other would be foolish --- an insistence that some particular metric is the ONLY metric that matters. In many ways the more interesting question is at the business level. Atom is probably competitive with ARM at the tech level, but to be relevant it also has to cost the same --- and that's going to hurt Intel longterm, no doubt about it.
> Hi everybody,
>
> I've been lurking for years, but the time has come when I would really love to pick the brains of
> the experts we have here. From my understanding, Atom is a much narrower design than Krait, Cortex
> A15 and others, and yet, in many benchmarks the older Saltwell core holds its own against even Krait
> in both FPU/INT, and against A15 in Linux integer benchmarks (but it gets decimated in FPU).
>
> So, my question is, how do I think about "Silvermont" competitive position against a fairly
> beefy modern ARM design such as the Cortex A15? From a high level perspective, it looks
> like on a per-clock basis it should be no contest - A15 is wider and more aggressive.
> But Intel is claiming that Silvermont is as fast as A15 on a per-clock basis.
>
> A couple of questions then:
>
> 1. How can a narrower design pull this off?
> 2. Is this likely to be integer only as A15 includes FMAC instructions,
> which in the right cases double FPU performance?
>
> Thank you all so much!
>
> Regards,
> Ashraf Eassa
There is VASTLY more to the performance of a CPU than the width of the superscalar pipeline. Issues of importance include
- the quality of the branch prediction (and the speed at which incorrect predictions are discovered and rectified)
- the flexibility of the pipeline (which includes things like in-order vs OoO, or how many instructions can be queued in various buffers --- eg size of ROB, size of load and store queues, number of rename registers)
- the memory pipeline (both on chip, so latency and bandwidth to L1 and L2, size of TLBs) and off-chip (so quality of the memory controller, width of the memory bus).
Compare, for example, back in the day the PPC 750 or 7400 with their Intel equivalents of the time. In theory they should have been about the same speed, with the PPC slightly ahead because its pipeline had better branch handling; in practice Intel was faster for almost all tasks because of the superior memory system at every level, from L1 all the way out to the DRAM.
One of the reasons Apple's Swift today feels superior to most of the competition, even with apparently lower specs, is again that Swift has a superior off-chip memory system. (Maybe also a superior on-chip memory system, but Apple has been secretive about that.)
There are MANY design points for a CPU, and none is obviously correct; especially in mobile where there remains an on-going argument about the relative importance of peak speed vs energy consumption. It's foolish at this point to assume that either Intel or the A15 designers made mistakes; more likely they have both produced remarkably good designs, and insisting that one is better than the other would be foolish --- an insistence that some particular metric is the ONLY metric that matters. In many ways the more interesting question is at the business level. Atom is probably competitive with ARM at the tech level, but to be relevant it also has to cost the same --- and that's going to hurt Intel longterm, no doubt about it.