By: Brendan (btrotter.delete@this.gmail.com), May 19, 2022 4:32 am
Room: Moderated Discussions
Hi,
Brett (ggtgp.delete@this.yahoo.com) on May 18, 2022 11:03 am wrote:
> Doug S (foo.delete@this.bar.bar) on May 18, 2022 10:26 am wrote:
> > Jukka Larja (roskakori2006.delete@this.gmail.com) on May 18, 2022 6:47 am wrote:
> > > me (me.delete@this.me.com) on May 18, 2022 5:50 am wrote:
> > > > > More likely, the opposite is going to happen in the next ten years: AVX512-less CPUs
> > > > > will rule the installed base and AVX512 will be, at best, HPC-only curiosity.
> > > > >
> > > > > 2021-2022 are peak years for AVx512 in terms of shipment and
> > > > > 2023 would be a peak year in terms of installed base.
> > > >
> > > > AVX-512 will be back on Intel's mainstream products. Eventually.
> > > > Especially since AMD is adding it in Zen 4.
> > >
> > > Just out of curiosity, has AMD said something official about
> > > this? I know there are rumours, but they go many ways.
> > >
> > > I'm not sure if Intel has said anything official, but I think there's a good reason to presume that
> > > next Core models (13th generation) to be released by the end of this year or early next year will
> > > still not have AVX-512 (due to efficiency cores not having it). By the 2024 who knows what makes
> > > sense and what not? Both Intel and AMD could decide that AVX-512 doesn't make sense in consumer
> > > products, or only makes sense as some heavily crippled, micro-coded compatibility thingy.
> >
> >
> > The timeline from designing in something like AVX512 to actually releasing CPUs that contain it isn't
> > short. AMD could have made a decision to include AVX512 if at the time they felt it looked like Intel
> > was going to introduce it across their whole line, or at least whole non-mobile line. Seeing them pull
> > back later could cause them to 1) disable AVX512 functionality that exists in a core when it is released
> > or 2) leave it enabled, but as with Intel include AVX512 in only some of their future cores.
> >
> > So even if Zen 4 includes AVX512, I wouldn't take it as given that all future
> > AMD cores will include it. Or that Intel would feel pressure to include AVX512
> > in all their cores even if AMD does go ahead and include it all of theirs.
> >
> > I think it is too early to decide 2023 is the peak year for the installed base of AVX512, but
> > it is way too early to say that AVX512 will exist AT ALL in any CPUs shipped five years from
> > now. Maybe Intel has decided such long vectors are only useful in HPC.
>
> High core count CPU’s are already memory starved, so adding AVX512 is pointless.
> With 5nm CPU’s you can scratch off the HPC market needing AVX512, as you can’t
> feed that many CPU’s much less the doubled bandwidth needs of AVX512 units.
>
> That leaves gamers, but with 8 cores you are already pretty starved of dram bandwidth
> and let’s say one core is using AVX512 and thus heating up. Now all 8 cores are down
> clocked in response and your net performance uplift of AVX512 is negative.
It'd be rare for the same software to be bottlenecked by both memory bandwidth and CPU frequency at the same time.
If the bottleneck is memory bandwidth; then AVX-512 (doing twice as much half as often) is likely negligibly better simply because SIMD width matches cache line width (e.g. no need to care about partial cache line stores).
If the bottleneck is CPU frequency; then doing twice as much work per cycle (from doubling SIMD width) means you can downclock to 50% and still not get a "work/time" performance decrease over AVX2. I doubt any Intel CPU downclocks that severely - it's more like "twice the work per cycle with 80% of the clock frequency = only 1.6 times more work done per second = still 60% faster despite downclocking".
Yes; there is some problem with infrequent bursts of AVX-512 (where later code that doesn't use AVX-512 gets penalized by a "downclocking hysteresis"); but that's mostly a "not enough work to justify AVX-512 in the first place" problem.
> More consistent performance of smaller vector units is better than random
> cores hitting AVX512 code and heating up, and thus down clocking.
I'm fairly sure that "consistently slow" (or failing to be fast enough to be bottlenecked by heat dissipation) is WORSE than "inconsistently faster" (fast enough to be bottlenecked by heat dissipation).
Of course when it is fast enough to be bottlenecked by heat dissipation, downclocking to improve "joules per floating point operation" (and getting more work done for the same heat) is a right solution.
> Apple has more dram bandwidth and thus could go wider, but Apple has better
> solutions that are lower power in its AI/NPU units and graphics.
Apple don't sell computers, they sell sealed black boxes full of anti-competitive practices. For desktop/server they need to die regardless of any technical merit. Fortunately this is incredibly likely given the inherent inflexibility of sealed black boxes (e.g. not being able to install more RAM, or throw in a few 3rd party GPUs, or ...).
> AVX512 Is dead until ram moves on chip in 5-10 years, even then you still hit heat
> issues which pushes you to use that extra die space for NPU compute units.
>
> So AVX512 is just plain dead dead.
The only valid complaint about AVX-512 is "spotty availability"; which is primarily caused by the Alder Lake "let's quickly duct-tape some cores together for marketing hype in an attempt to counter M1 Pro reviews" anomaly. Most (all?) other desktop/server CPUs from Intel have had AVX-512 for the last few (several?) years and all planned CPUs from Intel (and AMD?) will have it. Sure, chips (Atom) designed for "Internet of Trash, remote garage door opener" applications won't have AVX-512, but that's an entirely different market and it's appropriate for that market.
> > Maybe they are designing
> > AVX1024 and intend a clear split between 1024 bit vectors in HPC and 256 bit vectors in everything
> > else. Maybe they are designing a length agnostic successor similar to SVE2?
Maybe making SIMD width larger than cache line size is painful (quad-ported L2 cache? Um...); and maybe increasing cache line size is painful (too much software tuned for 64-bit cache line size); and maybe length agnostic is insanity when you've already got the ability to freely mix multiple SIMD sizes in the same piece of code and when the problem is availability/adoption (yet another thing that most computers and most software don't support?); and maybe AVX-512 is the final step in the "64-bit -> 128-bit -> 256-bit -> 512-bit" sequence and will be around for the foreseeable future.
- Brendan
Brett (ggtgp.delete@this.yahoo.com) on May 18, 2022 11:03 am wrote:
> Doug S (foo.delete@this.bar.bar) on May 18, 2022 10:26 am wrote:
> > Jukka Larja (roskakori2006.delete@this.gmail.com) on May 18, 2022 6:47 am wrote:
> > > me (me.delete@this.me.com) on May 18, 2022 5:50 am wrote:
> > > > > More likely, the opposite is going to happen in the next ten years: AVX512-less CPUs
> > > > > will rule the installed base and AVX512 will be, at best, HPC-only curiosity.
> > > > >
> > > > > 2021-2022 are peak years for AVx512 in terms of shipment and
> > > > > 2023 would be a peak year in terms of installed base.
> > > >
> > > > AVX-512 will be back on Intel's mainstream products. Eventually.
> > > > Especially since AMD is adding it in Zen 4.
> > >
> > > Just out of curiosity, has AMD said something official about
> > > this? I know there are rumours, but they go many ways.
> > >
> > > I'm not sure if Intel has said anything official, but I think there's a good reason to presume that
> > > next Core models (13th generation) to be released by the end of this year or early next year will
> > > still not have AVX-512 (due to efficiency cores not having it). By the 2024 who knows what makes
> > > sense and what not? Both Intel and AMD could decide that AVX-512 doesn't make sense in consumer
> > > products, or only makes sense as some heavily crippled, micro-coded compatibility thingy.
> >
> >
> > The timeline from designing in something like AVX512 to actually releasing CPUs that contain it isn't
> > short. AMD could have made a decision to include AVX512 if at the time they felt it looked like Intel
> > was going to introduce it across their whole line, or at least whole non-mobile line. Seeing them pull
> > back later could cause them to 1) disable AVX512 functionality that exists in a core when it is released
> > or 2) leave it enabled, but as with Intel include AVX512 in only some of their future cores.
> >
> > So even if Zen 4 includes AVX512, I wouldn't take it as given that all future
> > AMD cores will include it. Or that Intel would feel pressure to include AVX512
> > in all their cores even if AMD does go ahead and include it all of theirs.
> >
> > I think it is too early to decide 2023 is the peak year for the installed base of AVX512, but
> > it is way too early to say that AVX512 will exist AT ALL in any CPUs shipped five years from
> > now. Maybe Intel has decided such long vectors are only useful in HPC.
>
> High core count CPU’s are already memory starved, so adding AVX512 is pointless.
> With 5nm CPU’s you can scratch off the HPC market needing AVX512, as you can’t
> feed that many CPU’s much less the doubled bandwidth needs of AVX512 units.
>
> That leaves gamers, but with 8 cores you are already pretty starved of dram bandwidth
> and let’s say one core is using AVX512 and thus heating up. Now all 8 cores are down
> clocked in response and your net performance uplift of AVX512 is negative.
It'd be rare for the same software to be bottlenecked by both memory bandwidth and CPU frequency at the same time.
If the bottleneck is memory bandwidth; then AVX-512 (doing twice as much half as often) is likely negligibly better simply because SIMD width matches cache line width (e.g. no need to care about partial cache line stores).
If the bottleneck is CPU frequency; then doing twice as much work per cycle (from doubling SIMD width) means you can downclock to 50% and still not get a "work/time" performance decrease over AVX2. I doubt any Intel CPU downclocks that severely - it's more like "twice the work per cycle with 80% of the clock frequency = only 1.6 times more work done per second = still 60% faster despite downclocking".
Yes; there is some problem with infrequent bursts of AVX-512 (where later code that doesn't use AVX-512 gets penalized by a "downclocking hysteresis"); but that's mostly a "not enough work to justify AVX-512 in the first place" problem.
> More consistent performance of smaller vector units is better than random
> cores hitting AVX512 code and heating up, and thus down clocking.
I'm fairly sure that "consistently slow" (or failing to be fast enough to be bottlenecked by heat dissipation) is WORSE than "inconsistently faster" (fast enough to be bottlenecked by heat dissipation).
Of course when it is fast enough to be bottlenecked by heat dissipation, downclocking to improve "joules per floating point operation" (and getting more work done for the same heat) is a right solution.
> Apple has more dram bandwidth and thus could go wider, but Apple has better
> solutions that are lower power in its AI/NPU units and graphics.
Apple don't sell computers, they sell sealed black boxes full of anti-competitive practices. For desktop/server they need to die regardless of any technical merit. Fortunately this is incredibly likely given the inherent inflexibility of sealed black boxes (e.g. not being able to install more RAM, or throw in a few 3rd party GPUs, or ...).
> AVX512 Is dead until ram moves on chip in 5-10 years, even then you still hit heat
> issues which pushes you to use that extra die space for NPU compute units.
>
> So AVX512 is just plain dead dead.
The only valid complaint about AVX-512 is "spotty availability"; which is primarily caused by the Alder Lake "let's quickly duct-tape some cores together for marketing hype in an attempt to counter M1 Pro reviews" anomaly. Most (all?) other desktop/server CPUs from Intel have had AVX-512 for the last few (several?) years and all planned CPUs from Intel (and AMD?) will have it. Sure, chips (Atom) designed for "Internet of Trash, remote garage door opener" applications won't have AVX-512, but that's an entirely different market and it's appropriate for that market.
> > Maybe they are designing
> > AVX1024 and intend a clear split between 1024 bit vectors in HPC and 256 bit vectors in everything
> > else. Maybe they are designing a length agnostic successor similar to SVE2?
Maybe making SIMD width larger than cache line size is painful (quad-ported L2 cache? Um...); and maybe increasing cache line size is painful (too much software tuned for 64-bit cache line size); and maybe length agnostic is insanity when you've already got the ability to freely mix multiple SIMD sizes in the same piece of code and when the problem is availability/adoption (yet another thing that most computers and most software don't support?); and maybe AVX-512 is the final step in the "64-bit -> 128-bit -> 256-bit -> 512-bit" sequence and will be around for the foreseeable future.
- Brendan