By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), May 21, 2022 4:58 pm
Room: Moderated Discussions
Brendan (btrotter.delete@this.gmail.com) on May 21, 2022 12:58 pm wrote:
>
> No. New software (designed to use the new CPUID leaves) would be aware of that problem
> and would avoid it - e.g. maybe using something like "sched_setaffinity()" to lock the
> thread to a specific CPU type before using CPUID (and maybe using "sched_setaffinity()"
> again later to restore the original CPU affinity and allow migration again).
That's the "it works in an embedded environment where you control everything" model.
And yes, it's salvageable in that model.
But it means that you just admit that AVX512 will never be used in a library (they can't just randomly set affinities - maybe the main program needs to be affine for other reasons?).
Or you just outright admit that cpuid is basically not useful for AVX512 at all, and you instead need some other system knowledge, and the use of AVX512 ends up being pretty much an external configuration parameter (ie "use it when the user explicitly tells us to, because using it in general isn't good, because we want to use the E-cores").
Either way, it just relegates AVX512 into a "niche feature".
Which admittedly I think it is anyway, but for me that just means "it's dead on arrival".
I seriously believe that in that kind of world, AVX512 is nothing but a marketing thing, and no normal person will ever use it in any real situation. It ends up being used purely for specialty software - HPC, things like that. Not things that any normal person should ever care about, and not things that should ever have a single square millimeter of silicon wasted on in desktop or laptop CPUs.
So you can't have it both ways. Either it's a specialty thing, and only relevant to a tiny tiny percentage of the market (and mainly used for marketing numbers), or it's something that any random app or library should use and might one day be worthwhile.
That second case requires a working and reliable CPUID bit that doesn't cause the code to either go ridiculously slowly (emulation) or get relegated to just a subset of the cores in the system (trap-and-migrate or explicit affinities).
I personally despise AVX512. I think it's a horrible model, and broken by design. So I'm very openly biased against it. I don't think it's useful for general purpose computing, and I think Intel would have been much better off just working on their FP accelerators instead (ie "just do it on the GPU").
The main CPU is for general-purpose stuff, and AVX512 fails that smell test.
And if you do a general-purpose vector thing, it needs to scale down to small enough CPU's that we wouldn't need to have this endless discussion about it.
Two decades ago I argued on this forum against Itanium because I didn't think it had a chance in hell to scale down, and was literally designed to fail in the mass market.
Today, I argue against AVX512 on the exact same principles. If you can't be relevant in the mass market, you might as well throw in the towel now. I'm sure ARM will be happy to help all those laptops (or all those throughput computing things) that just don't want the wasted space and effort of AVX512.
Of course, maybe Intel can make a small and "good enough" AVX512 unit so that this becomes a non-issue for that reason - support it on every CPU, just in a weaker format (but not so weak that using AVX512 is slower than the alternative).
I just personally suspect that AVX512 was never designed for that, and just doesn't scale down very well. But I'm not a hardware engineer.
Linus
>
> No. New software (designed to use the new CPUID leaves) would be aware of that problem
> and would avoid it - e.g. maybe using something like "sched_setaffinity()" to lock the
> thread to a specific CPU type before using CPUID (and maybe using "sched_setaffinity()"
> again later to restore the original CPU affinity and allow migration again).
That's the "it works in an embedded environment where you control everything" model.
And yes, it's salvageable in that model.
But it means that you just admit that AVX512 will never be used in a library (they can't just randomly set affinities - maybe the main program needs to be affine for other reasons?).
Or you just outright admit that cpuid is basically not useful for AVX512 at all, and you instead need some other system knowledge, and the use of AVX512 ends up being pretty much an external configuration parameter (ie "use it when the user explicitly tells us to, because using it in general isn't good, because we want to use the E-cores").
Either way, it just relegates AVX512 into a "niche feature".
Which admittedly I think it is anyway, but for me that just means "it's dead on arrival".
I seriously believe that in that kind of world, AVX512 is nothing but a marketing thing, and no normal person will ever use it in any real situation. It ends up being used purely for specialty software - HPC, things like that. Not things that any normal person should ever care about, and not things that should ever have a single square millimeter of silicon wasted on in desktop or laptop CPUs.
So you can't have it both ways. Either it's a specialty thing, and only relevant to a tiny tiny percentage of the market (and mainly used for marketing numbers), or it's something that any random app or library should use and might one day be worthwhile.
That second case requires a working and reliable CPUID bit that doesn't cause the code to either go ridiculously slowly (emulation) or get relegated to just a subset of the cores in the system (trap-and-migrate or explicit affinities).
I personally despise AVX512. I think it's a horrible model, and broken by design. So I'm very openly biased against it. I don't think it's useful for general purpose computing, and I think Intel would have been much better off just working on their FP accelerators instead (ie "just do it on the GPU").
The main CPU is for general-purpose stuff, and AVX512 fails that smell test.
And if you do a general-purpose vector thing, it needs to scale down to small enough CPU's that we wouldn't need to have this endless discussion about it.
Two decades ago I argued on this forum against Itanium because I didn't think it had a chance in hell to scale down, and was literally designed to fail in the mass market.
Today, I argue against AVX512 on the exact same principles. If you can't be relevant in the mass market, you might as well throw in the towel now. I'm sure ARM will be happy to help all those laptops (or all those throughput computing things) that just don't want the wasted space and effort of AVX512.
Of course, maybe Intel can make a small and "good enough" AVX512 unit so that this becomes a non-issue for that reason - support it on every CPU, just in a weaker format (but not so weak that using AVX512 is slower than the alternative).
I just personally suspect that AVX512 was never designed for that, and just doesn't scale down very well. But I'm not a hardware engineer.
Linus