By: Maynard Handley (name99.delete@this.name99.org), November 18, 2020 9:37 am
Room: Moderated Discussions
Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:02 am wrote:
> TJ (notanemail.delete@this.bla.com) on November 18, 2020 1:10 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 17, 2020 11:18 am wrote:
> > > Doug S (foo.delete@this.bar.bar) on November 17, 2020 10:25 am wrote:
> > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 16, 2020 6:22 pm wrote:
> > > > > If the AMD beats M1 at the same or even better power = bad result for Apple.
> > > >
> > > >
> > > > Not really, given that the M1 is their low end solution.
> > > >
> > >
> > > Renoir isn't AMD's highend either. They have been selling 16-64 units of the same uarch for a year.
> > >
> > > >
> > > > Over
> > > > the next two years we'll see their midrange and high end stuff.
> > > >
> > >
> > > Sure, but see above.
> > >
> > > The point was, we have to look at the multithread perf/watt. If Apple loses it's edge it
> > > has in single-thread in multi-thread, that means something for competitiveness in an higher-corecount
> > > chip. It would actually pour cold water on the x86 is doomed narratives.
> > >
> > >
> > > Side note: if AMD beats M1 say in 15W envelope or if M1 only
> > > wins narrowly, despite having much bigger efficiency
> > > in single thread (it is close to performance of Zen 3, but at 2-3× less power), what is the reason?
> > > It's simple: SMT. Had Apple implemented it, it would run away in Cinebench.
> > > Lesson to the people saying it's a pointless/stupid/doomed feature.
> > > Seems Renoir is able to bridge its "worse single-thread" and "worse
> > > manufacturing node" disadvantage pretty much thanks to SMT.
> > >
> > > Which also tells you where the biggest threat from Apple is. It pretty much caught
> > > up with state of the art x86's single core performance AND has process advantage.
> > > It could shoot ahead in performance in two areas if it chose to:
> > > 1) SMT as discussed. Not having SMT leaves massive multithread performance
> > > gains (end energy efficiency gains, more importantly) on the table.
> > > 2) AMD and to a bit less degree Intel squeeze the single core
> > > frequency of the core during single-thread boosting,
> > > with very high voltage and advanced power management so that
> > > they can pretty much run it as fast as the silicon
> > > allows and as high or even higher than manual overclocking can reach. This is why the power consumption in
> > > their single core turbo boosts is so high (it does dial way lower during all-core load clocks).
> > > Apple seems to only have simple turbo that drops clocks a bit
> > > on multicore load, but it is only a small difference.
> > > That implies Apple could extract a lot of frequency if it
> > > went as advanced on power management and aggressive
> > > on turbo as AMD does. I don't know how high it could go - the low power it exhibits suggest there is a lot
> > > of headroom, but perhaps the wide engine just couldin't handle much more due to timing even if it doesn't
> > > have high power output. But some potential Apple has not tapped yet is likely there.
> > >
> > > As I said here, weaknesses are where a huge comeback can origin at, so Intel and
> > > AMD gotta hurry with architecture improvements and make sure Apple doesn't fly past
> > > them if it adopts these features. Not that I think they are standing still.
> >
> > At least older intel architectures used to gain 10-30% with SMT depending
> > on load. But can a very high IPC design like A14/M1 really reach 30%?
> > I think there is much less time available where SMT can be active in A14/M1. I think the efficiency
> > cores gives a better multi-thread boost than SMT with not too much worse power efficiency.
> > The efficiency cores requires a bit more transistors though compared to adding SMT in the
> > big cores, but efficiency cores are still needed for power efficiency during low loads.
> >
> > My guess is that Apple never adds SMT due to complexity, security and diminishing return reasons.
> >
>
> Potential for gains from SMT should be best with very wide (with lots of execution resources
> in backend) architectures, because you exploit resources that are left unused due to cache misses
> and similar "bubbles". The wider you go, the harder is to keep everything fed all the time,
> so the amount of exploitable bubbles should go up. And Apple has a very wide core.
>
> SMT is something like regenerative braking, it's benefiting a second time from what you already have.
These analyses of SMT continue to ignore why Apple does so well in IPC and energy!
SMT is a decision to swap something that is cheap and plentiful (space for an *independent* core on the die) with something that is expensive and in extremely short supply (the SRAM that feeds the predictors and caches that give you all that IPC for a particular core).
Explain to me why that is a sensible tradeoff...
> TJ (notanemail.delete@this.bla.com) on November 18, 2020 1:10 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 17, 2020 11:18 am wrote:
> > > Doug S (foo.delete@this.bar.bar) on November 17, 2020 10:25 am wrote:
> > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 16, 2020 6:22 pm wrote:
> > > > > If the AMD beats M1 at the same or even better power = bad result for Apple.
> > > >
> > > >
> > > > Not really, given that the M1 is their low end solution.
> > > >
> > >
> > > Renoir isn't AMD's highend either. They have been selling 16-64 units of the same uarch for a year.
> > >
> > > >
> > > > Over
> > > > the next two years we'll see their midrange and high end stuff.
> > > >
> > >
> > > Sure, but see above.
> > >
> > > The point was, we have to look at the multithread perf/watt. If Apple loses it's edge it
> > > has in single-thread in multi-thread, that means something for competitiveness in an higher-corecount
> > > chip. It would actually pour cold water on the x86 is doomed narratives.
> > >
> > >
> > > Side note: if AMD beats M1 say in 15W envelope or if M1 only
> > > wins narrowly, despite having much bigger efficiency
> > > in single thread (it is close to performance of Zen 3, but at 2-3× less power), what is the reason?
> > > It's simple: SMT. Had Apple implemented it, it would run away in Cinebench.
> > > Lesson to the people saying it's a pointless/stupid/doomed feature.
> > > Seems Renoir is able to bridge its "worse single-thread" and "worse
> > > manufacturing node" disadvantage pretty much thanks to SMT.
> > >
> > > Which also tells you where the biggest threat from Apple is. It pretty much caught
> > > up with state of the art x86's single core performance AND has process advantage.
> > > It could shoot ahead in performance in two areas if it chose to:
> > > 1) SMT as discussed. Not having SMT leaves massive multithread performance
> > > gains (end energy efficiency gains, more importantly) on the table.
> > > 2) AMD and to a bit less degree Intel squeeze the single core
> > > frequency of the core during single-thread boosting,
> > > with very high voltage and advanced power management so that
> > > they can pretty much run it as fast as the silicon
> > > allows and as high or even higher than manual overclocking can reach. This is why the power consumption in
> > > their single core turbo boosts is so high (it does dial way lower during all-core load clocks).
> > > Apple seems to only have simple turbo that drops clocks a bit
> > > on multicore load, but it is only a small difference.
> > > That implies Apple could extract a lot of frequency if it
> > > went as advanced on power management and aggressive
> > > on turbo as AMD does. I don't know how high it could go - the low power it exhibits suggest there is a lot
> > > of headroom, but perhaps the wide engine just couldin't handle much more due to timing even if it doesn't
> > > have high power output. But some potential Apple has not tapped yet is likely there.
> > >
> > > As I said here, weaknesses are where a huge comeback can origin at, so Intel and
> > > AMD gotta hurry with architecture improvements and make sure Apple doesn't fly past
> > > them if it adopts these features. Not that I think they are standing still.
> >
> > At least older intel architectures used to gain 10-30% with SMT depending
> > on load. But can a very high IPC design like A14/M1 really reach 30%?
> > I think there is much less time available where SMT can be active in A14/M1. I think the efficiency
> > cores gives a better multi-thread boost than SMT with not too much worse power efficiency.
> > The efficiency cores requires a bit more transistors though compared to adding SMT in the
> > big cores, but efficiency cores are still needed for power efficiency during low loads.
> >
> > My guess is that Apple never adds SMT due to complexity, security and diminishing return reasons.
> >
>
> Potential for gains from SMT should be best with very wide (with lots of execution resources
> in backend) architectures, because you exploit resources that are left unused due to cache misses
> and similar "bubbles". The wider you go, the harder is to keep everything fed all the time,
> so the amount of exploitable bubbles should go up. And Apple has a very wide core.
>
> SMT is something like regenerative braking, it's benefiting a second time from what you already have.
These analyses of SMT continue to ignore why Apple does so well in IPC and energy!
SMT is a decision to swap something that is cheap and plentiful (space for an *independent* core on the die) with something that is expensive and in extremely short supply (the SRAM that feeds the predictors and caches that give you all that IPC for a particular core).
Explain to me why that is a sensible tradeoff...