By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), May 22, 2022 8:49 pm
Room: Moderated Discussions
Brendan (btrotter.delete@this.gmail.com) on May 22, 2022 8:41 pm wrote:
>
> Libraries? I was mostly talking about normal processes ("generic app"). For (shared) libraries you're already
> in a world of suckage because a compiler can't optimize anything between caller and callee (even with link-time
> optimization),
I really get the feeling that you have no idea what you're talking about.
People use AVX2 for libraries all the time, and there is no fundamental reason why AVX512 would be any different. Your "optimize between caller and callee" is a complete red herring and just word salad without any meaning.
You can use vector extensions entirely inside of libraries, and in fact that is traditionally the common - and almost only use of them outside of HPC. Vector extensions are used for hashing etc, and when somebody calls various cryptographic functions they often end up using the vector extensions without ever knowing or caring.
This is not some theoretical thing, Brendan. This is reality. This is how 99% of all AVX2 use is done. Almost nobody uses AVX2 directly using compiler intrinsics, it's all done by calling various library functions that have optimized versions that use AVX2.
And no, AVX512 is not different in any real way.
Or rather, it is different if it hits the heterogeneous issues we've discussed, and it's different in a bad way. As in "uselessly bad", not "minor little problem".
And exactly like Andrey tried to explain to you, the actual library function that gets used tends to be picked either at library load time or at first use, and then it is fixed for the lifetime of the whole process (and fixed across threads). That isn't the only way to do it, no, but it's by far the common one, and it's one of the major uses of the cpuid instruction in modern programming.
Other somewhat models are installation-time optimizations, or run-time JIT generation, but none of those really change the end result in any serious way.
So the application - or the programmer - doesn't know, and doesn't care, how the library actually implements whatever crypto function (or memset, or whatever random library function that decided that "hey, avx512 is a good idea for this"). In fact, the program may have been compiled long before new libraries came out that started using new CPU features, so the whole "programmer doesn't even know" is really really fundamental.
The whole point of libraries is that they expose interfaces - not implementations.
And trust me, just because you do "memset()" does not mean that you want to always run on a P-core. Neither does running some optimized hashing function. And no, that "AVX-512 for crypto" is not some odd made-up example, it is something that Intel talks about in their white-papers.
Yet that is literally what you seem to think the solution is, because you don't understand how the world works.
And the thing is, if AVX512 isn't usable for random real-world things like cryptography etc, then AVX512 is simply not worth it AT ALL.
It really is that simple, and you really are that wrong about libraries.
Linus
>
> Libraries? I was mostly talking about normal processes ("generic app"). For (shared) libraries you're already
> in a world of suckage because a compiler can't optimize anything between caller and callee (even with link-time
> optimization),
I really get the feeling that you have no idea what you're talking about.
People use AVX2 for libraries all the time, and there is no fundamental reason why AVX512 would be any different. Your "optimize between caller and callee" is a complete red herring and just word salad without any meaning.
You can use vector extensions entirely inside of libraries, and in fact that is traditionally the common - and almost only use of them outside of HPC. Vector extensions are used for hashing etc, and when somebody calls various cryptographic functions they often end up using the vector extensions without ever knowing or caring.
This is not some theoretical thing, Brendan. This is reality. This is how 99% of all AVX2 use is done. Almost nobody uses AVX2 directly using compiler intrinsics, it's all done by calling various library functions that have optimized versions that use AVX2.
And no, AVX512 is not different in any real way.
Or rather, it is different if it hits the heterogeneous issues we've discussed, and it's different in a bad way. As in "uselessly bad", not "minor little problem".
And exactly like Andrey tried to explain to you, the actual library function that gets used tends to be picked either at library load time or at first use, and then it is fixed for the lifetime of the whole process (and fixed across threads). That isn't the only way to do it, no, but it's by far the common one, and it's one of the major uses of the cpuid instruction in modern programming.
Other somewhat models are installation-time optimizations, or run-time JIT generation, but none of those really change the end result in any serious way.
So the application - or the programmer - doesn't know, and doesn't care, how the library actually implements whatever crypto function (or memset, or whatever random library function that decided that "hey, avx512 is a good idea for this"). In fact, the program may have been compiled long before new libraries came out that started using new CPU features, so the whole "programmer doesn't even know" is really really fundamental.
The whole point of libraries is that they expose interfaces - not implementations.
And trust me, just because you do "memset()" does not mean that you want to always run on a P-core. Neither does running some optimized hashing function. And no, that "AVX-512 for crypto" is not some odd made-up example, it is something that Intel talks about in their white-papers.
Yet that is literally what you seem to think the solution is, because you don't understand how the world works.
And the thing is, if AVX512 isn't usable for random real-world things like cryptography etc, then AVX512 is simply not worth it AT ALL.
It really is that simple, and you really are that wrong about libraries.
Linus