By: Simon Farnsworth (simon.delete@this.farnz.org.uk), May 24, 2022 3:21 am
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 23, 2022 10:38 pm wrote:
> Björn Ragnar Björnsson (bjorn.ragnar.delete@this.gmail.com) on May 23, 2022 5:41 pm wrote:
> > Hopefully they've learned
> > their lesson, to wit they disabled AVX-512 in BIOS updates, didn't they?
> It sounds like you view that as a good thing. My understanding is that people were
> more upset by the removal (especially after initial benchmarks included AVX-512).
>
> Perhaps you prefer to keep the status quo. Here's a problem, though: scheduling an instruction
> on big OoO cores costs considerably more energy than most operations, even FP mul. Thus vectorization
> is pretty much required for energy efficiency (it amortizes that cost across e.g. 16 lanes).
> We're talking 5-10x energy reduction here. Surely that is worth pursuing.
>
> Unfortunately not that many developers understand yet that SIMD/vectors are widely useful, not just
> in ML/cryptography/HPC/image processing niches. Runtime dispatch for heterogeneous devices (in the
> sense of: a binary doesn't know what type of single-ISA machine it's going to run on: Haswell, Skylake
> etc) is also already a solved problem. AVX-512 is the first x86 SIMD instruction set that's reasonable
> to program (quite complete, useful new instructions for general-purpose applications). Dropping AVX-512
> even in one generation is not a helpful signal for increasing its adoption.
>
> I have not seen any evidence that AVX-512 was disabled because of software necessity or even convenience.
> Multiple people including Linus have said it would be feasible to affinitize-on-fault.
>
> So a question: what exactly are the "far-reaching consequences" you are concerned about?
> Is it the extra cost to the scheduler for checking a "don't move me" counter/flag? Does
> that offset 5x energy efficiency gains in perhaps 10% of software (as a modest target)?
>
> > Let's not forget that only a miniscule fraction of the CPUs ever produced or are
> > likely to be produced in the medium term could benefit from such shenanigans.
> I understand where you're coming from but disagree with this conclusion. Other ISAs also have examples of heavy
> features that might not be feasible/desirable in their equivalent of E-cores, see https://www.realworldtech.com/forum/?threadid=206023&curpostid=206308.
> In particular: AMX and SME for on-device ML. Should those be relegated to servers only? Or never used because
> software only wants to target the lowest common denominator?
AMX and SME get interesting, because they're clearly designed as tightly coupled co-processors (the way FPUs were in the early 1980s - Weitek and similar designs, not just x87 and 68881/2). Given this, it seems reasonable to assume that all cores in a system will have access to AMX or SME, just as they all have access to a GPU - it's just that performance will vary depending on whether you're on the efficiency cores or the performance cores (which is something kernels are learning to deal with).
> Björn Ragnar Björnsson (bjorn.ragnar.delete@this.gmail.com) on May 23, 2022 5:41 pm wrote:
> > Hopefully they've learned
> > their lesson, to wit they disabled AVX-512 in BIOS updates, didn't they?
> It sounds like you view that as a good thing. My understanding is that people were
> more upset by the removal (especially after initial benchmarks included AVX-512).
>
> Perhaps you prefer to keep the status quo. Here's a problem, though: scheduling an instruction
> on big OoO cores costs considerably more energy than most operations, even FP mul. Thus vectorization
> is pretty much required for energy efficiency (it amortizes that cost across e.g. 16 lanes).
> We're talking 5-10x energy reduction here. Surely that is worth pursuing.
>
> Unfortunately not that many developers understand yet that SIMD/vectors are widely useful, not just
> in ML/cryptography/HPC/image processing niches. Runtime dispatch for heterogeneous devices (in the
> sense of: a binary doesn't know what type of single-ISA machine it's going to run on: Haswell, Skylake
> etc) is also already a solved problem. AVX-512 is the first x86 SIMD instruction set that's reasonable
> to program (quite complete, useful new instructions for general-purpose applications). Dropping AVX-512
> even in one generation is not a helpful signal for increasing its adoption.
>
> I have not seen any evidence that AVX-512 was disabled because of software necessity or even convenience.
> Multiple people including Linus have said it would be feasible to affinitize-on-fault.
>
> So a question: what exactly are the "far-reaching consequences" you are concerned about?
> Is it the extra cost to the scheduler for checking a "don't move me" counter/flag? Does
> that offset 5x energy efficiency gains in perhaps 10% of software (as a modest target)?
>
> > Let's not forget that only a miniscule fraction of the CPUs ever produced or are
> > likely to be produced in the medium term could benefit from such shenanigans.
> I understand where you're coming from but disagree with this conclusion. Other ISAs also have examples of heavy
> features that might not be feasible/desirable in their equivalent of E-cores, see https://www.realworldtech.com/forum/?threadid=206023&curpostid=206308.
> In particular: AMX and SME for on-device ML. Should those be relegated to servers only? Or never used because
> software only wants to target the lowest common denominator?
AMX and SME get interesting, because they're clearly designed as tightly coupled co-processors (the way FPUs were in the early 1980s - Weitek and similar designs, not just x87 and 68881/2). Given this, it seems reasonable to assume that all cores in a system will have access to AMX or SME, just as they all have access to a GPU - it's just that performance will vary depending on whether you're on the efficiency cores or the performance cores (which is something kernels are learning to deal with).