By: Linus Torvalds (torvalds.delete@this.linux-foundation.org),
Room: Moderated Discussions
Geoff Langdale (geoff.langdale.delete@this.gmail.com) on July 11, 2020 7:49 pm wrote:
>
> But we have some SIMD-based table lookup
> stuff that's way faster than the integer equivalent both because you're doing a lot of stuff at once, but you're
> also doing stuff where there's no integer equivalent (there's no PSHUFB for a GPR register).
Yeah, and we might even use some of it. We have places where we do "vectorization" by hand and use integer registers to hold as many bytes as possible, and look for '/' or the terminating NUL byte (obviously I'm talking about filename copies) and create a hash of the result at the same time, one (integer) word at a time.
We could possibly even have an AVX512 version.
If it was available, and if it didn't tank performance due to frequency issues.
But it isn't, and it does.
Fragmentation kills your market. The fact is, AVX512 isn't worth it, because it's not reliably enough there. And I don't think it's reasonably ever going to be, because it was never designed to work on low end.
With a new not-even-released-yet CPU's not supporting it being a case in point.
And that makes AVX512 actively bad. It was literally designed not to be used in any generic code, and is basically only useful for "hey, we have this kernel of code that is so hot that we'll just create five different versions of it.
What part of that is hard to understand? It sure seems to be something Intel cannot get its head around, since Intel keeps making that mistake over and over again.
Linus
>
> But we have some SIMD-based table lookup
> stuff that's way faster than the integer equivalent both because you're doing a lot of stuff at once, but you're
> also doing stuff where there's no integer equivalent (there's no PSHUFB for a GPR register).
Yeah, and we might even use some of it. We have places where we do "vectorization" by hand and use integer registers to hold as many bytes as possible, and look for '/' or the terminating NUL byte (obviously I'm talking about filename copies) and create a hash of the result at the same time, one (integer) word at a time.
We could possibly even have an AVX512 version.
If it was available, and if it didn't tank performance due to frequency issues.
But it isn't, and it does.
Fragmentation kills your market. The fact is, AVX512 isn't worth it, because it's not reliably enough there. And I don't think it's reasonably ever going to be, because it was never designed to work on low end.
With a new not-even-released-yet CPU's not supporting it being a case in point.
And that makes AVX512 actively bad. It was literally designed not to be used in any generic code, and is basically only useful for "hey, we have this kernel of code that is so hot that we'll just create five different versions of it.
What part of that is hard to understand? It sure seems to be something Intel cannot get its head around, since Intel keeps making that mistake over and over again.
Linus


