By: Brendan (btrotter.delete@this.gmail.com), May 24, 2022 1:44 pm
Room: Moderated Discussions
Hi,
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on May 23, 2022 4:54 pm wrote:
> Brendan (btrotter.delete@this.gmail.com) on May 23, 2022 1:13 pm wrote:
> >
> > Mental masturbation is things like circular logic - e.g.
> > "I don't want to support anything except the common
> > case, because the common case is useless, because I didn't want to support anything except the common case".
>
> It's not about me supporting it.
>
> The kernel side is fairly trivial. It ranges from "no changes at all" (ie users just do their own
> CPU affinity to deal with it) to "minimal changes" (some ELF flag to say "start with this affinity")
> to fairly straightforward bigger support (eg "fault-on-use and auto-affine the thread").
>
> In fact, when I first heard of Intel's heterogeneous model
> in Alder Lake, I was like "we can support that easily".
>
> Because on the kernel side, it really is mostly a non-issue. Any kernel use of AVX512
> is already very limited (I think we have a couple of optimized crypto library functions),
> and the kernel already obviously supports CPU affinities. It's stupid special-case
> code, but it's not necessarily complicated stupid special-case code.
>
> (Of course, anything to do with the x86 extended FP state is actually fairly complicated to
> begin with, because of how it's all oddly lumped together in "xstate" and has about a billion
> different variations, so adding m ore special cases to that code is never a good thing).
>
> So no. My argument is not at all "I don't want to support it",
> and you haven't heard that argument here in this thread.
>
> My argument is "it's stupid and doesn't work in user space, and any silicon that implements that
> heterogeneous model is just wasted space by hardware designers who couldn't do it right".
>
> Because in practice that heterogeneous model means that 99% of users will never use that AVX512 hardware,
> since 99% of users are all in libraries, and I hope I have explained why they would not use it.
You have made it clear that supporting it in dynamically linked libraries is currently more important than supporting it in programs and statically linked libraries.
But... here's where we're having a problem:
You have a habit of assuming that the past (before new technology or new capabilities are introduced) accurately predicts the future (after new technology or new capabilities are introduced, and after the inevitable adoption period).
In 2012, you didn't say "0% of software uses SSE2, therefore no software will use SSE2 in future, so supporting SSE2 is silly and doesn't work" (or maybe you did but I doubt it). You understood that introducing something new changes the future; and for 80x86 PCs often it takes about 10 years between the introduction of something new (64-bit 80x86, SSE2, UEFI, Wayland, SystemD, ...) and the end of the adoption period.
More specifically; for AVX-512 I think we agree that Intel bungled adoption badly (first making it "HPC only" to ensure its failure because almost nobody has a reason to care, then splitting it into far too many sub-features, then the "rushed" Alder Lake mess just when AVX-512 was starting to gain adoption). Because of this it's like we're currently only 20% of the way into the adoption period, and the statistics you can get today (for how much software of what type uses AVX-512) are relatively worthless (a poor indicator of what "max. adoption" will look like in 10 years time).
More specifically; for heterogeneous CPUs there are 3 cases:
a) "same ISA, different performance characteristics"; where allowing software to select code to suit the CPU type (with different optimization) is merely a small performance improvement and not strictly required. If support for this was added to the Linux kernel today it'd probably take 5 years to get an accurate prediction of how much of what kind of software uses it, and 10 years until you approach "max. adoption". Any statistics you find today are completely irrelevant.
b) "slightly different ISA (e.g. with or without AVX512)"; where allowing software to select code to suit the CPU type (with different optimization) is still just a performance improvement (over just using the common subset) and not strictly required, but likely to be a larger performance difference. If support for this was added to the Linux kernel today it'd probably take at least 5 years for hardware vendors to create a system that uses it, then another 5 years to get an accurate prediction of how much of what kind of software uses it; and it'd be 15 or more years until you approach "max. adoption". Any statistics you find today are completely irrelevant.
c) "different ISA (e.g. seamless support for a mixture of 80x86 and ARM cores in the same system)"; where allowing software to select code to suit the CPU type is strict requirement. If support for this was added to the Linux kernel today it'd probably be the same (see note) - at least 5 years for hardware vendors to create a system that uses it, and 15 or more years until you approach "max. adoption". Any statistics you find today are completely irrelevant.
If you combine both of these (relatively worthless statistics for AVX-512, and completely irrelevant/non-existent statistics for heterogeneous CPUs) you don't end up with anything that can be used for assessing if a proposed change will/won't be useful after the adoption period.
Essentially; when you say something like "Because in practice that heterogeneous model (that isn't supported today) means that 99% of users will never (in 15+ years time, after it's made its way through kernel support to compiler/tools support to normal applications and then reaches "max. adoption") use that AVX512 hardware, since 99% of users (today and not in 15+ years time) are all in libraries" the only thing it does is make me think you're stupid.
> And that "99% of users wouldn't use it at all" is for a feature that already doesn't have very many users
> to begin with, because it's already fairly specialized. Compiler people think auto-vectorization is common
> and a big deal. Outside of very special cases it's neither. So a questionably useful feature thus becomes completely
> useless because you realistically can't use it in the one situation where it's most useful.
>
> I'd much rather have Intel give people more cache, more cores, or higher frequencies
> than give me a terminally broken heterogeneous AVX512 system.
For this you won't have to worry - Intel can't do "broken heterogeneous AVX512" because Windows is no better at "same ISA, different performance characteristics" (which I consider a necessary first step towards "slightly different ISA") than Linux. It'll be something else (e.g. "broken heterogeneous AVX-1024" or "broken heterogeneous SVE3") that you'll need to worry about.
- Brendan
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on May 23, 2022 4:54 pm wrote:
> Brendan (btrotter.delete@this.gmail.com) on May 23, 2022 1:13 pm wrote:
> >
> > Mental masturbation is things like circular logic - e.g.
> > "I don't want to support anything except the common
> > case, because the common case is useless, because I didn't want to support anything except the common case".
>
> It's not about me supporting it.
>
> The kernel side is fairly trivial. It ranges from "no changes at all" (ie users just do their own
> CPU affinity to deal with it) to "minimal changes" (some ELF flag to say "start with this affinity")
> to fairly straightforward bigger support (eg "fault-on-use and auto-affine the thread").
>
> In fact, when I first heard of Intel's heterogeneous model
> in Alder Lake, I was like "we can support that easily".
>
> Because on the kernel side, it really is mostly a non-issue. Any kernel use of AVX512
> is already very limited (I think we have a couple of optimized crypto library functions),
> and the kernel already obviously supports CPU affinities. It's stupid special-case
> code, but it's not necessarily complicated stupid special-case code.
>
> (Of course, anything to do with the x86 extended FP state is actually fairly complicated to
> begin with, because of how it's all oddly lumped together in "xstate" and has about a billion
> different variations, so adding m ore special cases to that code is never a good thing).
>
> So no. My argument is not at all "I don't want to support it",
> and you haven't heard that argument here in this thread.
>
> My argument is "it's stupid and doesn't work in user space, and any silicon that implements that
> heterogeneous model is just wasted space by hardware designers who couldn't do it right".
>
> Because in practice that heterogeneous model means that 99% of users will never use that AVX512 hardware,
> since 99% of users are all in libraries, and I hope I have explained why they would not use it.
You have made it clear that supporting it in dynamically linked libraries is currently more important than supporting it in programs and statically linked libraries.
But... here's where we're having a problem:
You have a habit of assuming that the past (before new technology or new capabilities are introduced) accurately predicts the future (after new technology or new capabilities are introduced, and after the inevitable adoption period).
In 2012, you didn't say "0% of software uses SSE2, therefore no software will use SSE2 in future, so supporting SSE2 is silly and doesn't work" (or maybe you did but I doubt it). You understood that introducing something new changes the future; and for 80x86 PCs often it takes about 10 years between the introduction of something new (64-bit 80x86, SSE2, UEFI, Wayland, SystemD, ...) and the end of the adoption period.
More specifically; for AVX-512 I think we agree that Intel bungled adoption badly (first making it "HPC only" to ensure its failure because almost nobody has a reason to care, then splitting it into far too many sub-features, then the "rushed" Alder Lake mess just when AVX-512 was starting to gain adoption). Because of this it's like we're currently only 20% of the way into the adoption period, and the statistics you can get today (for how much software of what type uses AVX-512) are relatively worthless (a poor indicator of what "max. adoption" will look like in 10 years time).
More specifically; for heterogeneous CPUs there are 3 cases:
a) "same ISA, different performance characteristics"; where allowing software to select code to suit the CPU type (with different optimization) is merely a small performance improvement and not strictly required. If support for this was added to the Linux kernel today it'd probably take 5 years to get an accurate prediction of how much of what kind of software uses it, and 10 years until you approach "max. adoption". Any statistics you find today are completely irrelevant.
b) "slightly different ISA (e.g. with or without AVX512)"; where allowing software to select code to suit the CPU type (with different optimization) is still just a performance improvement (over just using the common subset) and not strictly required, but likely to be a larger performance difference. If support for this was added to the Linux kernel today it'd probably take at least 5 years for hardware vendors to create a system that uses it, then another 5 years to get an accurate prediction of how much of what kind of software uses it; and it'd be 15 or more years until you approach "max. adoption". Any statistics you find today are completely irrelevant.
c) "different ISA (e.g. seamless support for a mixture of 80x86 and ARM cores in the same system)"; where allowing software to select code to suit the CPU type is strict requirement. If support for this was added to the Linux kernel today it'd probably be the same (see note) - at least 5 years for hardware vendors to create a system that uses it, and 15 or more years until you approach "max. adoption". Any statistics you find today are completely irrelevant.
If you combine both of these (relatively worthless statistics for AVX-512, and completely irrelevant/non-existent statistics for heterogeneous CPUs) you don't end up with anything that can be used for assessing if a proposed change will/won't be useful after the adoption period.
Essentially; when you say something like "Because in practice that heterogeneous model (that isn't supported today) means that 99% of users will never (in 15+ years time, after it's made its way through kernel support to compiler/tools support to normal applications and then reaches "max. adoption") use that AVX512 hardware, since 99% of users (today and not in 15+ years time) are all in libraries" the only thing it does is make me think you're stupid.
> And that "99% of users wouldn't use it at all" is for a feature that already doesn't have very many users
> to begin with, because it's already fairly specialized. Compiler people think auto-vectorization is common
> and a big deal. Outside of very special cases it's neither. So a questionably useful feature thus becomes completely
> useless because you realistically can't use it in the one situation where it's most useful.
>
> I'd much rather have Intel give people more cache, more cores, or higher frequencies
> than give me a terminally broken heterogeneous AVX512 system.
For this you won't have to worry - Intel can't do "broken heterogeneous AVX512" because Windows is no better at "same ISA, different performance characteristics" (which I consider a necessary first step towards "slightly different ISA") than Linux. It'll be something else (e.g. "broken heterogeneous AVX-1024" or "broken heterogeneous SVE3") that you'll need to worry about.
- Brendan