By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), November 27, 2022 9:50 am
Room: Moderated Discussions
rwessel (rwessel.delete@this.yahoo.com) on November 26, 2022 4:02 pm wrote:
>
> People have been telling me that compilers were going to vectorize/parallelize my integer
> code automatically Real-Soon-Now, for four decades. Actually they've been saying it
> for five decades, but I mostly wasn't paying attention in the first one.
>
> So this time for sure?
>
> It's hard to avoid cynicism after half a century of failure to deliver.
I think the problem actually goes deeper than that.
Yes, vectorization turns out to be really really hard in real life. It's nontrivial even for the truly simple loops that do just one thing over and over, and then you hit any kind of real code and it gets really really complicated unless the data stream was very much designed for it.
So that "we've had half a century of failure" is kind of true.
But at the same time, I think the much deeper issue is that yes, compilers have gotten a lot better, but not only did vectorization turn out to be much harder in practice than the trivial examples were (and those trivial examples were anything but trivial to get to work correctly in the compiler), but most of the time you don't even really have high repeat counts in the first place.
Seriously, go look at random code on github - I dare you. Very little of it is clearly delineated loops over arrays with high repeat counts. It's not very helpful to be able to vectorize some "search for a value in an array", when people end up using hash tables or other more complex data structures instead of arrays for searching.
And when you do see arrays, how many of those are just arrays of pointers (and yes, even non-C-style languages end up using pointers internally)? Or if not pointers, you have arrays of smallish data structures rather than "arrays of one thing".
It is hard to find simple enough code to vectorize at all.
Does it exists? Of course. Some people really do floating point math in arrays. It's probably not as common as the people who just want some (non-vectorized) cross product that you can also use a vector unit for, but obviously yes, real people use computers for big calculations still.
But mass market? You're not doing weather forecasting on that CPU you bought. So most arrays you see are likely just byte stream ones (Intel used to love talking about JSON parsing examples or something), or they are image or video data. The latter is generally better done on the GPU.
End result: go look for AVX-512 benchmarks. You'll find them. And then ask yourself: how many of these are relevant to what I bought my PC/mac/phone for?
And I claim that that is the real problem with AVX-512 (and pretty much any vectorization). I personally cannot find a single benchmark that does anything I would ever do - not even remotely close.
So if you aren't into some chess engine, if you aren't into parsing (but not using) JSON, if you aren't into software raytracing (as opposed to raytracing in games, which is clearly starting to take off thanks to GPU support), what else is there?
Yes, yes, yes, you can use it in random places. It's not only some JSON parsing library, you'll find it in other libraries too. I suspect the most use it gets in many situations is - drum roll - implementing memmove/strchr/strcmp style things. And in absolutely zero of those cases will it have been done by the compiler auto-vectorizing it.
It's easy to wave your hands and say "just use vector units". You see people doing it here. In reality, not only has auto-vectorization not ever done so great in the first place, there aren't many places to do it at all.
I will now be bombarded by all the people with their own specialty engine who disagree violently, because obviously the main goal for a CPU is to run chess engines or whatever. And looking at some CPU benchmark sites, that might even look like reality, and I'm clearly the misguided person here.
Linus
>
> People have been telling me that compilers were going to vectorize/parallelize my integer
> code automatically Real-Soon-Now, for four decades. Actually they've been saying it
> for five decades, but I mostly wasn't paying attention in the first one.
>
> So this time for sure?
>
> It's hard to avoid cynicism after half a century of failure to deliver.
I think the problem actually goes deeper than that.
Yes, vectorization turns out to be really really hard in real life. It's nontrivial even for the truly simple loops that do just one thing over and over, and then you hit any kind of real code and it gets really really complicated unless the data stream was very much designed for it.
So that "we've had half a century of failure" is kind of true.
But at the same time, I think the much deeper issue is that yes, compilers have gotten a lot better, but not only did vectorization turn out to be much harder in practice than the trivial examples were (and those trivial examples were anything but trivial to get to work correctly in the compiler), but most of the time you don't even really have high repeat counts in the first place.
Seriously, go look at random code on github - I dare you. Very little of it is clearly delineated loops over arrays with high repeat counts. It's not very helpful to be able to vectorize some "search for a value in an array", when people end up using hash tables or other more complex data structures instead of arrays for searching.
And when you do see arrays, how many of those are just arrays of pointers (and yes, even non-C-style languages end up using pointers internally)? Or if not pointers, you have arrays of smallish data structures rather than "arrays of one thing".
It is hard to find simple enough code to vectorize at all.
Does it exists? Of course. Some people really do floating point math in arrays. It's probably not as common as the people who just want some (non-vectorized) cross product that you can also use a vector unit for, but obviously yes, real people use computers for big calculations still.
But mass market? You're not doing weather forecasting on that CPU you bought. So most arrays you see are likely just byte stream ones (Intel used to love talking about JSON parsing examples or something), or they are image or video data. The latter is generally better done on the GPU.
End result: go look for AVX-512 benchmarks. You'll find them. And then ask yourself: how many of these are relevant to what I bought my PC/mac/phone for?
And I claim that that is the real problem with AVX-512 (and pretty much any vectorization). I personally cannot find a single benchmark that does anything I would ever do - not even remotely close.
So if you aren't into some chess engine, if you aren't into parsing (but not using) JSON, if you aren't into software raytracing (as opposed to raytracing in games, which is clearly starting to take off thanks to GPU support), what else is there?
Yes, yes, yes, you can use it in random places. It's not only some JSON parsing library, you'll find it in other libraries too. I suspect the most use it gets in many situations is - drum roll - implementing memmove/strchr/strcmp style things. And in absolutely zero of those cases will it have been done by the compiler auto-vectorizing it.
It's easy to wave your hands and say "just use vector units". You see people doing it here. In reality, not only has auto-vectorization not ever done so great in the first place, there aren't many places to do it at all.
I will now be bombarded by all the people with their own specialty engine who disagree violently, because obviously the main goal for a CPU is to run chess engines or whatever. And looking at some CPU benchmark sites, that might even look like reality, and I'm clearly the misguided person here.
Linus