By: Brendan (btrotter.delete@this.gmail.com), May 22, 2022 11:18 am
Room: Moderated Discussions
Hi,
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on May 21, 2022 4:58 pm wrote:
> Brendan (btrotter.delete@this.gmail.com) on May 21, 2022 12:58 pm wrote:
> >
> > No. New software (designed to use the new CPUID leaves) would be aware of that problem
> > and would avoid it - e.g. maybe using something like "sched_setaffinity()" to lock the
> > thread to a specific CPU type before using CPUID (and maybe using "sched_setaffinity()"
> > again later to restore the original CPU affinity and allow migration again).
>
> That's the "it works in an embedded environment where you control everything" model.
Erm, no?
It's the run-time dispatch (e.g. choosing which version of functions to use based on CPUID results) that some compilers (ICC) have been doing for ages; but slightly modified to work with dissimilar cores by making it more fine grained (not once at program startup, but "anytime where needed") and preventing scheduler from migrating to a different CPU type at the wrong time.
It's the "developer controls nothing, generic app designed for any/all 80x86 CPUs adapts to whatever it happens to find itself running on" model (the opposite of the "embedded environment where you control everything" model).
> And yes, it's salvageable in that model.
>
> But it means that you just admit that AVX512 will never be used in a library (they can't just
> randomly set affinities - maybe the main program needs to be affine for other reasons?).
Is this because you have a single CPU affinity trying to do the job of both "user configured affinity" and "software controlled affinity", so that changing one messes up the other? If you had 2 separate affinities (where scheduler combines both with bitwise AND before making decisions) this wouldn't happen, and library functions could save the old "software affinity", modify it, do what it likes, then restore the original "software affinity" before returning to its caller; without messing up "user configured affinity".
Of course kernel could simply provide an "enable/disable migration to different CPU type" system call that doesn't use CPU affinity (possibly built on a "number of times disabled" counter so migration is only enabled when the counter is zero, and nesting works).
> Or you just outright admit that cpuid is basically not useful for AVX512 at all, and you
> instead need some other system knowledge, and the use of AVX512 ends up being pretty much
> an external configuration parameter (ie "use it when the user explicitly tells us to,
> because using it in general isn't good, because we want to use the E-cores").
>
> Either way, it just relegates AVX512 into a "niche feature".
>
> Which admittedly I think it is anyway, but for me that just means "it's dead on arrival".
I think you're starting to conflate 2 different things here - the availability/adoption problems of AVX-512 (which is no different to the introduction of any ISA extension, which is something that's probably happened at least 100 times over the last 50 years); and the "dissimilar CPUs" stuff where "AVX-512" is merely a convenient placeholder representing any and all differences between cores (including instruction timings), for any ISA from the past, present and future.
> I seriously believe that in that kind of world, AVX512 is nothing but a marketing thing, and no normal
> person will ever use it in any real situation. It ends up being used purely for specialty software
> - HPC, things like that. Not things that any normal person should ever care about, and not things that
> should ever have a single square millimeter of silicon wasted on in desktop or laptop CPUs.
>
> So you can't have it both ways. Either it's a specialty thing, and only relevant to a
> tiny tiny percentage of the market (and mainly used for marketing numbers), or it's something
> that any random app or library should use and might one day be worthwhile.
>
> That second case requires a working and reliable CPUID bit that doesn't cause the
> code to either go ridiculously slowly (emulation) or get relegated to just a subset
> of the cores in the system (trap-and-migrate or explicit affinities).
>
> I personally despise AVX512. I think it's a horrible model, and broken by design. So I'm very openly
> biased against it. I don't think it's useful for general purpose computing, and I think Intel would have
> been much better off just working on their FP accelerators instead (ie "just do it on the GPU").
>
> The main CPU is for general-purpose stuff, and AVX512 fails that smell test.
I think "general purpose" is merely a synonym for "many special purposes superimposed".
I think AVX-512 is mostly an inevitable continuation of the "64-bit MMX -> 128-bit SSE -> 256-bit AVX2 -> 512-bit AVX-512" sequence, where SIMD width doubles every 5 years or so.
I also think one of Intel's recurring flaws is ruining useful things by relegating them to a "HPC niche" (and sometimes a "server only niche") to stifle adoption.
> And if you do a general-purpose vector thing, it needs to scale down to small
> enough CPU's that we wouldn't need to have this endless discussion about it.
>
> Two decades ago I argued on this forum against Itanium because I didn't think it had a
> chance in hell to scale down, and was literally designed to fail in the mass market.
>
> Today, I argue against AVX512 on the exact same principles. If you can't be relevant in the mass market,
> you might as well throw in the towel now. I'm sure ARM will be happy to help all those laptops (or all
> those throughput computing things) that just don't want the wasted space and effort of AVX512.
>
> Of course, maybe Intel can make a small and "good enough" AVX512 unit so that this
> becomes a non-issue for that reason - support it on every CPU, just in a weaker
> format (but not so weak that using AVX512 is slower than the alternative).
>
> I just personally suspect that AVX512 was never designed for that, and
> just doesn't scale down very well. But I'm not a hardware engineer.
I don't think AVX-512 was designed to scale down to today's small CPUs; I think AVX-512 was designed to stay the same size while people's concept of "small CPU" expands and becomes what we'd consider "large" today. Moore's law isn't dead yet. I'm guessing that by 2025 the E cores will support AVX-512.
- Brendan
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on May 21, 2022 4:58 pm wrote:
> Brendan (btrotter.delete@this.gmail.com) on May 21, 2022 12:58 pm wrote:
> >
> > No. New software (designed to use the new CPUID leaves) would be aware of that problem
> > and would avoid it - e.g. maybe using something like "sched_setaffinity()" to lock the
> > thread to a specific CPU type before using CPUID (and maybe using "sched_setaffinity()"
> > again later to restore the original CPU affinity and allow migration again).
>
> That's the "it works in an embedded environment where you control everything" model.
Erm, no?
It's the run-time dispatch (e.g. choosing which version of functions to use based on CPUID results) that some compilers (ICC) have been doing for ages; but slightly modified to work with dissimilar cores by making it more fine grained (not once at program startup, but "anytime where needed") and preventing scheduler from migrating to a different CPU type at the wrong time.
It's the "developer controls nothing, generic app designed for any/all 80x86 CPUs adapts to whatever it happens to find itself running on" model (the opposite of the "embedded environment where you control everything" model).
> And yes, it's salvageable in that model.
>
> But it means that you just admit that AVX512 will never be used in a library (they can't just
> randomly set affinities - maybe the main program needs to be affine for other reasons?).
Is this because you have a single CPU affinity trying to do the job of both "user configured affinity" and "software controlled affinity", so that changing one messes up the other? If you had 2 separate affinities (where scheduler combines both with bitwise AND before making decisions) this wouldn't happen, and library functions could save the old "software affinity", modify it, do what it likes, then restore the original "software affinity" before returning to its caller; without messing up "user configured affinity".
Of course kernel could simply provide an "enable/disable migration to different CPU type" system call that doesn't use CPU affinity (possibly built on a "number of times disabled" counter so migration is only enabled when the counter is zero, and nesting works).
> Or you just outright admit that cpuid is basically not useful for AVX512 at all, and you
> instead need some other system knowledge, and the use of AVX512 ends up being pretty much
> an external configuration parameter (ie "use it when the user explicitly tells us to,
> because using it in general isn't good, because we want to use the E-cores").
>
> Either way, it just relegates AVX512 into a "niche feature".
>
> Which admittedly I think it is anyway, but for me that just means "it's dead on arrival".
I think you're starting to conflate 2 different things here - the availability/adoption problems of AVX-512 (which is no different to the introduction of any ISA extension, which is something that's probably happened at least 100 times over the last 50 years); and the "dissimilar CPUs" stuff where "AVX-512" is merely a convenient placeholder representing any and all differences between cores (including instruction timings), for any ISA from the past, present and future.
> I seriously believe that in that kind of world, AVX512 is nothing but a marketing thing, and no normal
> person will ever use it in any real situation. It ends up being used purely for specialty software
> - HPC, things like that. Not things that any normal person should ever care about, and not things that
> should ever have a single square millimeter of silicon wasted on in desktop or laptop CPUs.
>
> So you can't have it both ways. Either it's a specialty thing, and only relevant to a
> tiny tiny percentage of the market (and mainly used for marketing numbers), or it's something
> that any random app or library should use and might one day be worthwhile.
>
> That second case requires a working and reliable CPUID bit that doesn't cause the
> code to either go ridiculously slowly (emulation) or get relegated to just a subset
> of the cores in the system (trap-and-migrate or explicit affinities).
>
> I personally despise AVX512. I think it's a horrible model, and broken by design. So I'm very openly
> biased against it. I don't think it's useful for general purpose computing, and I think Intel would have
> been much better off just working on their FP accelerators instead (ie "just do it on the GPU").
>
> The main CPU is for general-purpose stuff, and AVX512 fails that smell test.
I think "general purpose" is merely a synonym for "many special purposes superimposed".
I think AVX-512 is mostly an inevitable continuation of the "64-bit MMX -> 128-bit SSE -> 256-bit AVX2 -> 512-bit AVX-512" sequence, where SIMD width doubles every 5 years or so.
I also think one of Intel's recurring flaws is ruining useful things by relegating them to a "HPC niche" (and sometimes a "server only niche") to stifle adoption.
> And if you do a general-purpose vector thing, it needs to scale down to small
> enough CPU's that we wouldn't need to have this endless discussion about it.
>
> Two decades ago I argued on this forum against Itanium because I didn't think it had a
> chance in hell to scale down, and was literally designed to fail in the mass market.
>
> Today, I argue against AVX512 on the exact same principles. If you can't be relevant in the mass market,
> you might as well throw in the towel now. I'm sure ARM will be happy to help all those laptops (or all
> those throughput computing things) that just don't want the wasted space and effort of AVX512.
>
> Of course, maybe Intel can make a small and "good enough" AVX512 unit so that this
> becomes a non-issue for that reason - support it on every CPU, just in a weaker
> format (but not so weak that using AVX512 is slower than the alternative).
>
> I just personally suspect that AVX512 was never designed for that, and
> just doesn't scale down very well. But I'm not a hardware engineer.
I don't think AVX-512 was designed to scale down to today's small CPUs; I think AVX-512 was designed to stay the same size while people's concept of "small CPU" expands and becomes what we'd consider "large" today. Moore's law isn't dead yet. I'm guessing that by 2025 the E cores will support AVX-512.
- Brendan