By: xyz (xyz.delete@this.xyz.xyz), November 18, 2020 1:05 pm
Room: Moderated Discussions
Andrei F (andrei.delete@this.anandtech.com) on November 18, 2020 5:21 am wrote:
> TJ (notanemail.delete@this.bla.com) on November 18, 2020 1:10 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 17, 2020 11:18 am wrote:
> > > Doug S (foo.delete@this.bar.bar) on November 17, 2020 10:25 am wrote:
> > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 16, 2020 6:22 pm wrote:
> > > > > If the AMD beats M1 at the same or even better power = bad result for Apple.
> > > >
> > > >
> > > > Not really, given that the M1 is their low end solution.
> > > >
> > >
> > > Renoir isn't AMD's highend either. They have been selling 16-64 units of the same uarch for a year.
> > >
> > > >
> > > > Over
> > > > the next two years we'll see their midrange and high end stuff.
> > > >
> > >
> > > Sure, but see above.
> > >
> > > The point was, we have to look at the multithread perf/watt. If Apple loses it's edge it
> > > has in single-thread in multi-thread, that means something for competitiveness in an higher-corecount
> > > chip. It would actually pour cold water on the x86 is doomed narratives.
> > >
> > >
> > > Side note: if AMD beats M1 say in 15W envelope or if M1 only
> > > wins narrowly, despite having much bigger efficiency
> > > in single thread (it is close to performance of Zen 3, but at 2-3× less power), what is the reason?
> > > It's simple: SMT. Had Apple implemented it, it would run away in Cinebench.
> > > Lesson to the people saying it's a pointless/stupid/doomed feature.
> > > Seems Renoir is able to bridge its "worse single-thread" and "worse
> > > manufacturing node" disadvantage pretty much thanks to SMT.
> > >
> > > Which also tells you where the biggest threat from Apple is. It pretty much caught
> > > up with state of the art x86's single core performance AND has process advantage.
> > > It could shoot ahead in performance in two areas if it chose to:
> > > 1) SMT as discussed. Not having SMT leaves massive multithread performance
> > > gains (end energy efficiency gains, more importantly) on the table.
> > > 2) AMD and to a bit less degree Intel squeeze the single core
> > > frequency of the core during single-thread boosting,
> > > with very high voltage and advanced power management so that
> > > they can pretty much run it as fast as the silicon
> > > allows and as high or even higher than manual overclocking can reach. This is why the power consumption in
> > > their single core turbo boosts is so high (it does dial way lower during all-core load clocks).
> > > Apple seems to only have simple turbo that drops clocks a bit
> > > on multicore load, but it is only a small difference.
> > > That implies Apple could extract a lot of frequency if it
> > > went as advanced on power management and aggressive
> > > on turbo as AMD does. I don't know how high it could go - the low power it exhibits suggest there is a lot
> > > of headroom, but perhaps the wide engine just couldin't handle much more due to timing even if it doesn't
> > > have high power output. But some potential Apple has not tapped yet is likely there.
> > >
> > > As I said here, weaknesses are where a huge comeback can origin at, so Intel and
> > > AMD gotta hurry with architecture improvements and make sure Apple doesn't fly past
> > > them if it adopts these features. Not that I think they are standing still.
> >
> > At least older intel architectures used to gain 10-30% with SMT depending
> > on load. But can a very high IPC design like A14/M1 really reach 30%?
> > I think there is much less time available where SMT can be active in A14/M1. I think the efficiency
> > cores gives a better multi-thread boost than SMT with not too much worse power efficiency.
> > The efficiency cores requires a bit more transistors though compared to adding SMT in the
> > big cores, but efficiency cores are still needed for power efficiency during low loads.
> >
> > My guess is that Apple never adds SMT due to complexity, security and diminishing return reasons.
> >
>
> 30% figure is an outlier figure for a few workloads. Across the board it's not even close to that. Can we stop
> declaring SMT an absolute must just because Cinebench has long dependency chains and scales well off of it?
>
>
>
> Saying stuff like "It's simple: SMT. Had Apple implemented it, it would run away in
> Cinebench." is just idiotic in the face of the argument you want to make about SMT
> MT scaling and Apple's reasons and internal decisions about not implementing it.
High IPC SPEC CPU is a poor test for SMT. You will see 30%+ SMT gains on Intel CPUs with f.e. IO heavy and commercial DB workloads that have many low IPC threads with high cache miss rates and IO.
> TJ (notanemail.delete@this.bla.com) on November 18, 2020 1:10 am wrote:
> > Dummond D. Slow (mental.delete@this.protozoa.us) on November 17, 2020 11:18 am wrote:
> > > Doug S (foo.delete@this.bar.bar) on November 17, 2020 10:25 am wrote:
> > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 16, 2020 6:22 pm wrote:
> > > > > If the AMD beats M1 at the same or even better power = bad result for Apple.
> > > >
> > > >
> > > > Not really, given that the M1 is their low end solution.
> > > >
> > >
> > > Renoir isn't AMD's highend either. They have been selling 16-64 units of the same uarch for a year.
> > >
> > > >
> > > > Over
> > > > the next two years we'll see their midrange and high end stuff.
> > > >
> > >
> > > Sure, but see above.
> > >
> > > The point was, we have to look at the multithread perf/watt. If Apple loses it's edge it
> > > has in single-thread in multi-thread, that means something for competitiveness in an higher-corecount
> > > chip. It would actually pour cold water on the x86 is doomed narratives.
> > >
> > >
> > > Side note: if AMD beats M1 say in 15W envelope or if M1 only
> > > wins narrowly, despite having much bigger efficiency
> > > in single thread (it is close to performance of Zen 3, but at 2-3× less power), what is the reason?
> > > It's simple: SMT. Had Apple implemented it, it would run away in Cinebench.
> > > Lesson to the people saying it's a pointless/stupid/doomed feature.
> > > Seems Renoir is able to bridge its "worse single-thread" and "worse
> > > manufacturing node" disadvantage pretty much thanks to SMT.
> > >
> > > Which also tells you where the biggest threat from Apple is. It pretty much caught
> > > up with state of the art x86's single core performance AND has process advantage.
> > > It could shoot ahead in performance in two areas if it chose to:
> > > 1) SMT as discussed. Not having SMT leaves massive multithread performance
> > > gains (end energy efficiency gains, more importantly) on the table.
> > > 2) AMD and to a bit less degree Intel squeeze the single core
> > > frequency of the core during single-thread boosting,
> > > with very high voltage and advanced power management so that
> > > they can pretty much run it as fast as the silicon
> > > allows and as high or even higher than manual overclocking can reach. This is why the power consumption in
> > > their single core turbo boosts is so high (it does dial way lower during all-core load clocks).
> > > Apple seems to only have simple turbo that drops clocks a bit
> > > on multicore load, but it is only a small difference.
> > > That implies Apple could extract a lot of frequency if it
> > > went as advanced on power management and aggressive
> > > on turbo as AMD does. I don't know how high it could go - the low power it exhibits suggest there is a lot
> > > of headroom, but perhaps the wide engine just couldin't handle much more due to timing even if it doesn't
> > > have high power output. But some potential Apple has not tapped yet is likely there.
> > >
> > > As I said here, weaknesses are where a huge comeback can origin at, so Intel and
> > > AMD gotta hurry with architecture improvements and make sure Apple doesn't fly past
> > > them if it adopts these features. Not that I think they are standing still.
> >
> > At least older intel architectures used to gain 10-30% with SMT depending
> > on load. But can a very high IPC design like A14/M1 really reach 30%?
> > I think there is much less time available where SMT can be active in A14/M1. I think the efficiency
> > cores gives a better multi-thread boost than SMT with not too much worse power efficiency.
> > The efficiency cores requires a bit more transistors though compared to adding SMT in the
> > big cores, but efficiency cores are still needed for power efficiency during low loads.
> >
> > My guess is that Apple never adds SMT due to complexity, security and diminishing return reasons.
> >
>
> 30% figure is an outlier figure for a few workloads. Across the board it's not even close to that. Can we stop
> declaring SMT an absolute must just because Cinebench has long dependency chains and scales well off of it?
>
>

>
> Saying stuff like "It's simple: SMT. Had Apple implemented it, it would run away in
> Cinebench." is just idiotic in the face of the argument you want to make about SMT
> MT scaling and Apple's reasons and internal decisions about not implementing it.
High IPC SPEC CPU is a poor test for SMT. You will see 30%+ SMT gains on Intel CPUs with f.e. IO heavy and commercial DB workloads that have many low IPC threads with high cache miss rates and IO.