By: TREZA (no.delete@this.ema.il), May 16, 2013 3:17 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on May 15, 2013 4:08 pm wrote:
> Ashraf Eassa (aeassa.delete@this.gmail.com) on May 15, 2013 11:59 am wrote:
> > Hi everybody,
> >
> > I've been lurking for years, but the time has come when I would really love to pick the brains of
> > the experts we have here. From my understanding, Atom is a much narrower design than Krait, Cortex
> > A15 and others, and yet, in many benchmarks the older Saltwell core holds its own against even Krait
> > in both FPU/INT, and against A15 in Linux integer benchmarks (but it gets decimated in FPU).
> >
> > So, my question is, how do I think about "Silvermont" competitive position against a fairly
> > beefy modern ARM design such as the Cortex A15? From a high level perspective, it looks
> > like on a per-clock basis it should be no contest - A15 is wider and more aggressive.
> > But Intel is claiming that Silvermont is as fast as A15 on a per-clock basis.
> >
> > A couple of questions then:
> >
> > 1. How can a narrower design pull this off?
> > 2. Is this likely to be integer only as A15 includes FMAC instructions,
> > which in the right cases double FPU performance?
> >
> > Thank you all so much!
> >
> > Regards,
> > Ashraf Eassa
>
> There is VASTLY more to the performance of a CPU than the width
> of the superscalar pipeline. Issues of importance include
> - the quality of the branch prediction (and the speed at which
> incorrect predictions are discovered and rectified)
> - the flexibility of the pipeline (which includes things like in-order vs OoO, or how many instructions can be
> queued in various buffers --- eg size of ROB, size of load and store queues, number of rename registers)
> - the memory pipeline (both on chip, so latency and bandwidth to L1 and L2, size of
> TLBs) and off-chip (so quality of the memory controller, width of the memory bus).
>
> Compare, for example, back in the day the PPC 750 or 7400 with their Intel equivalents of the
> time. In theory they should have been about the same speed, with the PPC slightly ahead because
> its pipeline had better branch handling; in practice Intel was faster for almost all tasks because
> of the superior memory system at every level, from L1 all the way out to the DRAM.
> One of the reasons Apple's Swift today feels superior to most of the competition, even with
> apparently lower specs, is again that Swift has a superior off-chip memory system. (Maybe
> also a superior on-chip memory system, but Apple has been secretive about that.)
>
> There are MANY design points for a CPU, and none is obviously correct; especially in mobile where there remains
> an on-going argument about the relative importance of peak speed vs energy consumption. It's foolish at this
> point to assume that either Intel or the A15 designers made mistakes; more likely they have both produced
> remarkably good designs, and insisting that one is better than the other would be foolish --- an insistence
> that some particular metric is the ONLY metric that matters. In many ways the more interesting question is
> at the business level. Atom is probably competitive with ARM at the tech level, but to be relevant it also
> has to cost the same --- and that's going to hurt Intel longterm, no doubt about it.
>
About the memory subsystem (and from personal experiments with my toy CPU), using standardised busses and expecting reuse can be sometimes a constraint for optimising performances. The reliance on AMBA and composable hardware IPs could be a disavantage for original ARMs compared to fully customised x86 and closed designs like Apple's.
> Ashraf Eassa (aeassa.delete@this.gmail.com) on May 15, 2013 11:59 am wrote:
> > Hi everybody,
> >
> > I've been lurking for years, but the time has come when I would really love to pick the brains of
> > the experts we have here. From my understanding, Atom is a much narrower design than Krait, Cortex
> > A15 and others, and yet, in many benchmarks the older Saltwell core holds its own against even Krait
> > in both FPU/INT, and against A15 in Linux integer benchmarks (but it gets decimated in FPU).
> >
> > So, my question is, how do I think about "Silvermont" competitive position against a fairly
> > beefy modern ARM design such as the Cortex A15? From a high level perspective, it looks
> > like on a per-clock basis it should be no contest - A15 is wider and more aggressive.
> > But Intel is claiming that Silvermont is as fast as A15 on a per-clock basis.
> >
> > A couple of questions then:
> >
> > 1. How can a narrower design pull this off?
> > 2. Is this likely to be integer only as A15 includes FMAC instructions,
> > which in the right cases double FPU performance?
> >
> > Thank you all so much!
> >
> > Regards,
> > Ashraf Eassa
>
> There is VASTLY more to the performance of a CPU than the width
> of the superscalar pipeline. Issues of importance include
> - the quality of the branch prediction (and the speed at which
> incorrect predictions are discovered and rectified)
> - the flexibility of the pipeline (which includes things like in-order vs OoO, or how many instructions can be
> queued in various buffers --- eg size of ROB, size of load and store queues, number of rename registers)
> - the memory pipeline (both on chip, so latency and bandwidth to L1 and L2, size of
> TLBs) and off-chip (so quality of the memory controller, width of the memory bus).
>
> Compare, for example, back in the day the PPC 750 or 7400 with their Intel equivalents of the
> time. In theory they should have been about the same speed, with the PPC slightly ahead because
> its pipeline had better branch handling; in practice Intel was faster for almost all tasks because
> of the superior memory system at every level, from L1 all the way out to the DRAM.
> One of the reasons Apple's Swift today feels superior to most of the competition, even with
> apparently lower specs, is again that Swift has a superior off-chip memory system. (Maybe
> also a superior on-chip memory system, but Apple has been secretive about that.)
>
> There are MANY design points for a CPU, and none is obviously correct; especially in mobile where there remains
> an on-going argument about the relative importance of peak speed vs energy consumption. It's foolish at this
> point to assume that either Intel or the A15 designers made mistakes; more likely they have both produced
> remarkably good designs, and insisting that one is better than the other would be foolish --- an insistence
> that some particular metric is the ONLY metric that matters. In many ways the more interesting question is
> at the business level. Atom is probably competitive with ARM at the tech level, but to be relevant it also
> has to cost the same --- and that's going to hurt Intel longterm, no doubt about it.
>
About the memory subsystem (and from personal experiments with my toy CPU), using standardised busses and expecting reuse can be sometimes a constraint for optimising performances. The reliance on AMBA and composable hardware IPs could be a disavantage for original ARMs compared to fully customised x86 and closed designs like Apple's.