By: Dummond D. Slow (mental.delete@this.protozoa.us), November 18, 2020 9:55 am
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on November 18, 2020 8:37 am wrote:
> Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:02 am wrote:
> > TJ (notanemail.delete@this.bla.com) on November 18, 2020 1:10 am wrote:
> > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 17, 2020 11:18 am wrote:
> > > > Doug S (foo.delete@this.bar.bar) on November 17, 2020 10:25 am wrote:
> > > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 16, 2020 6:22 pm wrote:
> > > > > > If the AMD beats M1 at the same or even better power = bad result for Apple.
> > > > >
> > > > >
> > > > > Not really, given that the M1 is their low end solution.
> > > > >
> > > >
> > > > Renoir isn't AMD's highend either. They have been selling 16-64 units of the same uarch for a year.
> > > >
> > > > >
> > > > > Over
> > > > > the next two years we'll see their midrange and high end stuff.
> > > > >
> > > >
> > > > Sure, but see above.
> > > >
> > > > The point was, we have to look at the multithread perf/watt. If Apple loses it's edge it
> > > > has in single-thread in multi-thread, that means something for competitiveness in an higher-corecount
> > > > chip. It would actually pour cold water on the x86 is doomed narratives.
> > > >
> > > >
> > > > Side note: if AMD beats M1 say in 15W envelope or if M1 only
> > > > wins narrowly, despite having much bigger efficiency
> > > > in single thread (it is close to performance of Zen 3, but at 2-3× less power), what is the reason?
> > > > It's simple: SMT. Had Apple implemented it, it would run away in Cinebench.
> > > > Lesson to the people saying it's a pointless/stupid/doomed feature.
> > > > Seems Renoir is able to bridge its "worse single-thread" and "worse
> > > > manufacturing node" disadvantage pretty much thanks to SMT.
> > > >
> > > > Which also tells you where the biggest threat from Apple is. It pretty much caught
> > > > up with state of the art x86's single core performance AND has process advantage.
> > > > It could shoot ahead in performance in two areas if it chose to:
> > > > 1) SMT as discussed. Not having SMT leaves massive multithread performance
> > > > gains (end energy efficiency gains, more importantly) on the table.
> > > > 2) AMD and to a bit less degree Intel squeeze the single core
> > > > frequency of the core during single-thread boosting,
> > > > with very high voltage and advanced power management so that
> > > > they can pretty much run it as fast as the silicon
> > > > allows and as high or even higher than manual overclocking can reach. This is why the power consumption in
> > > > their single core turbo boosts is so high (it does dial way lower during all-core load clocks).
> > > > Apple seems to only have simple turbo that drops clocks a bit
> > > > on multicore load, but it is only a small difference.
> > > > That implies Apple could extract a lot of frequency if it
> > > > went as advanced on power management and aggressive
> > > > on turbo as AMD does. I don't know how high it could go - the low power it exhibits suggest there is a lot
> > > > of headroom, but perhaps the wide engine just couldin't handle much more due to timing even if it doesn't
> > > > have high power output. But some potential Apple has not tapped yet is likely there.
> > > >
> > > > As I said here, weaknesses are where a huge comeback can origin at, so Intel and
> > > > AMD gotta hurry with architecture improvements and make sure Apple doesn't fly past
> > > > them if it adopts these features. Not that I think they are standing still.
> > >
> > > At least older intel architectures used to gain 10-30% with SMT depending
> > > on load. But can a very high IPC design like A14/M1 really reach 30%?
> > > I think there is much less time available where SMT can be active in A14/M1. I think the efficiency
> > > cores gives a better multi-thread boost than SMT with not too much worse power efficiency.
> > > The efficiency cores requires a bit more transistors though compared to adding SMT in the
> > > big cores, but efficiency cores are still needed for power efficiency during low loads.
> > >
> > > My guess is that Apple never adds SMT due to complexity, security and diminishing return reasons.
> > >
> >
> > Potential for gains from SMT should be best with very wide (with lots of execution resources
> > in backend) architectures, because you exploit resources that are left unused due to cache misses
> > and similar "bubbles". The wider you go, the harder is to keep everything fed all the time,
> > so the amount of exploitable bubbles should go up. And Apple has a very wide core.
> >
> > SMT is something like regenerative braking, it's benefiting a second time from what you already have.
>
> These analyses of SMT continue to ignore why Apple does so well in IPC and energy!
>
> SMT is a decision to swap something that is cheap and plentiful (space for an *independent*
> core on the die) with something that is expensive and in extremely short supply (the SRAM
> that feeds the predictors and caches that give you all that IPC for a particular core).
>
> Explain to me why that is a sensible tradeoff...
>
Because it is energy efficient and area efficient? High IPC and wide core is hard too and yet I don'T see you saying Apple is dumb and should just fill the die with 96 small cores.
You know, I'm not saying I can be totally sure I am right and SMT is the best way in every scenario. It is possible that it would just not work for Apple-style uarch (I doubt this tho and you have no compelling reason to argue that, IMHO).
You guys might want to do some selfreflection and ponder thoroughly if SMT really looks so bad and unusable or if you perhaps just don't like it because Apple doesn't have it.
Again I'm not saying you do that or are aware... but sour grapes thinking is very easy to influence one's opinions even if one is not aware of it.
> Dummond D. Slow (mental.delete@this.protozoa.us) on November 18, 2020 7:02 am wrote:
> > TJ (notanemail.delete@this.bla.com) on November 18, 2020 1:10 am wrote:
> > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 17, 2020 11:18 am wrote:
> > > > Doug S (foo.delete@this.bar.bar) on November 17, 2020 10:25 am wrote:
> > > > > Dummond D. Slow (mental.delete@this.protozoa.us) on November 16, 2020 6:22 pm wrote:
> > > > > > If the AMD beats M1 at the same or even better power = bad result for Apple.
> > > > >
> > > > >
> > > > > Not really, given that the M1 is their low end solution.
> > > > >
> > > >
> > > > Renoir isn't AMD's highend either. They have been selling 16-64 units of the same uarch for a year.
> > > >
> > > > >
> > > > > Over
> > > > > the next two years we'll see their midrange and high end stuff.
> > > > >
> > > >
> > > > Sure, but see above.
> > > >
> > > > The point was, we have to look at the multithread perf/watt. If Apple loses it's edge it
> > > > has in single-thread in multi-thread, that means something for competitiveness in an higher-corecount
> > > > chip. It would actually pour cold water on the x86 is doomed narratives.
> > > >
> > > >
> > > > Side note: if AMD beats M1 say in 15W envelope or if M1 only
> > > > wins narrowly, despite having much bigger efficiency
> > > > in single thread (it is close to performance of Zen 3, but at 2-3× less power), what is the reason?
> > > > It's simple: SMT. Had Apple implemented it, it would run away in Cinebench.
> > > > Lesson to the people saying it's a pointless/stupid/doomed feature.
> > > > Seems Renoir is able to bridge its "worse single-thread" and "worse
> > > > manufacturing node" disadvantage pretty much thanks to SMT.
> > > >
> > > > Which also tells you where the biggest threat from Apple is. It pretty much caught
> > > > up with state of the art x86's single core performance AND has process advantage.
> > > > It could shoot ahead in performance in two areas if it chose to:
> > > > 1) SMT as discussed. Not having SMT leaves massive multithread performance
> > > > gains (end energy efficiency gains, more importantly) on the table.
> > > > 2) AMD and to a bit less degree Intel squeeze the single core
> > > > frequency of the core during single-thread boosting,
> > > > with very high voltage and advanced power management so that
> > > > they can pretty much run it as fast as the silicon
> > > > allows and as high or even higher than manual overclocking can reach. This is why the power consumption in
> > > > their single core turbo boosts is so high (it does dial way lower during all-core load clocks).
> > > > Apple seems to only have simple turbo that drops clocks a bit
> > > > on multicore load, but it is only a small difference.
> > > > That implies Apple could extract a lot of frequency if it
> > > > went as advanced on power management and aggressive
> > > > on turbo as AMD does. I don't know how high it could go - the low power it exhibits suggest there is a lot
> > > > of headroom, but perhaps the wide engine just couldin't handle much more due to timing even if it doesn't
> > > > have high power output. But some potential Apple has not tapped yet is likely there.
> > > >
> > > > As I said here, weaknesses are where a huge comeback can origin at, so Intel and
> > > > AMD gotta hurry with architecture improvements and make sure Apple doesn't fly past
> > > > them if it adopts these features. Not that I think they are standing still.
> > >
> > > At least older intel architectures used to gain 10-30% with SMT depending
> > > on load. But can a very high IPC design like A14/M1 really reach 30%?
> > > I think there is much less time available where SMT can be active in A14/M1. I think the efficiency
> > > cores gives a better multi-thread boost than SMT with not too much worse power efficiency.
> > > The efficiency cores requires a bit more transistors though compared to adding SMT in the
> > > big cores, but efficiency cores are still needed for power efficiency during low loads.
> > >
> > > My guess is that Apple never adds SMT due to complexity, security and diminishing return reasons.
> > >
> >
> > Potential for gains from SMT should be best with very wide (with lots of execution resources
> > in backend) architectures, because you exploit resources that are left unused due to cache misses
> > and similar "bubbles". The wider you go, the harder is to keep everything fed all the time,
> > so the amount of exploitable bubbles should go up. And Apple has a very wide core.
> >
> > SMT is something like regenerative braking, it's benefiting a second time from what you already have.
>
> These analyses of SMT continue to ignore why Apple does so well in IPC and energy!
>
> SMT is a decision to swap something that is cheap and plentiful (space for an *independent*
> core on the die) with something that is expensive and in extremely short supply (the SRAM
> that feeds the predictors and caches that give you all that IPC for a particular core).
>
> Explain to me why that is a sensible tradeoff...
>
Because it is energy efficient and area efficient? High IPC and wide core is hard too and yet I don'T see you saying Apple is dumb and should just fill the die with 96 small cores.
You know, I'm not saying I can be totally sure I am right and SMT is the best way in every scenario. It is possible that it would just not work for Apple-style uarch (I doubt this tho and you have no compelling reason to argue that, IMHO).
You guys might want to do some selfreflection and ponder thoroughly if SMT really looks so bad and unusable or if you perhaps just don't like it because Apple doesn't have it.
Again I'm not saying you do that or are aware... but sour grapes thinking is very easy to influence one's opinions even if one is not aware of it.