By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), May 23, 2022 9:38 pm
Room: Moderated Discussions
Björn Ragnar Björnsson (bjorn.ragnar.delete@this.gmail.com) on May 23, 2022 5:41 pm wrote:
> Hopefully they've learned
> their lesson, to wit they disabled AVX-512 in BIOS updates, didn't they?
It sounds like you view that as a good thing. My understanding is that people were more upset by the removal (especially after initial benchmarks included AVX-512).
Perhaps you prefer to keep the status quo. Here's a problem, though: scheduling an instruction on big OoO cores costs considerably more energy than most operations, even FP mul. Thus vectorization is pretty much required for energy efficiency (it amortizes that cost across e.g. 16 lanes). We're talking 5-10x energy reduction here. Surely that is worth pursuing.
Unfortunately not that many developers understand yet that SIMD/vectors are widely useful, not just in ML/cryptography/HPC/image processing niches. Runtime dispatch for heterogeneous devices (in the sense of: a binary doesn't know what type of single-ISA machine it's going to run on: Haswell, Skylake etc) is also already a solved problem. AVX-512 is the first x86 SIMD instruction set that's reasonable to program (quite complete, useful new instructions for general-purpose applications). Dropping AVX-512 even in one generation is not a helpful signal for increasing its adoption.
I have not seen any evidence that AVX-512 was disabled because of software necessity or even convenience. Multiple people including Linus have said it would be feasible to affinitize-on-fault.
So a question: what exactly are the "far-reaching consequences" you are concerned about? Is it the extra cost to the scheduler for checking a "don't move me" counter/flag? Does that offset 5x energy efficiency gains in perhaps 10% of software (as a modest target)?
> Let's not forget that only a miniscule fraction of the CPUs ever produced or are
> likely to be produced in the medium term could benefit from such shenanigans.
I understand where you're coming from but disagree with this conclusion. Other ISAs also have examples of heavy features that might not be feasible/desirable in their equivalent of E-cores, see https://www.realworldtech.com/forum/?threadid=206023&curpostid=206308. In particular: AMX and SME for on-device ML. Should those be relegated to servers only? Or never used because software only wants to target the lowest common denominator?
> Hopefully they've learned
> their lesson, to wit they disabled AVX-512 in BIOS updates, didn't they?
It sounds like you view that as a good thing. My understanding is that people were more upset by the removal (especially after initial benchmarks included AVX-512).
Perhaps you prefer to keep the status quo. Here's a problem, though: scheduling an instruction on big OoO cores costs considerably more energy than most operations, even FP mul. Thus vectorization is pretty much required for energy efficiency (it amortizes that cost across e.g. 16 lanes). We're talking 5-10x energy reduction here. Surely that is worth pursuing.
Unfortunately not that many developers understand yet that SIMD/vectors are widely useful, not just in ML/cryptography/HPC/image processing niches. Runtime dispatch for heterogeneous devices (in the sense of: a binary doesn't know what type of single-ISA machine it's going to run on: Haswell, Skylake etc) is also already a solved problem. AVX-512 is the first x86 SIMD instruction set that's reasonable to program (quite complete, useful new instructions for general-purpose applications). Dropping AVX-512 even in one generation is not a helpful signal for increasing its adoption.
I have not seen any evidence that AVX-512 was disabled because of software necessity or even convenience. Multiple people including Linus have said it would be feasible to affinitize-on-fault.
So a question: what exactly are the "far-reaching consequences" you are concerned about? Is it the extra cost to the scheduler for checking a "don't move me" counter/flag? Does that offset 5x energy efficiency gains in perhaps 10% of software (as a modest target)?
> Let's not forget that only a miniscule fraction of the CPUs ever produced or are
> likely to be produced in the medium term could benefit from such shenanigans.
I understand where you're coming from but disagree with this conclusion. Other ISAs also have examples of heavy features that might not be feasible/desirable in their equivalent of E-cores, see https://www.realworldtech.com/forum/?threadid=206023&curpostid=206308. In particular: AMX and SME for on-device ML. Should those be relegated to servers only? Or never used because software only wants to target the lowest common denominator?