By: Doug S (foo.delete@this.bar.bar), May 15, 2022 10:50 am
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on May 14, 2022 12:27 pm wrote:
> Simon Farnsworth (simon.delete@this.farnz.org.uk) on May 14, 2022 5:20 am wrote:
> > SVE2's big advantage over NEON is not wider vectors, but all the compiler-convenience features
> > it has that allow a compiler to be more aggressive about auto-vectorization. For people who
> > are hand-tuning codes for peak performance, SVE2 at 128 bit and NEON are about the same, but
> > SVE2 pulls ahead handily (due to the FFR register and associated instructions) when you're
> > writing "serial" code and relying on the compiler doing something sensible to it.
> >
> > You won't get the same performance this way as you would tuning
> > your code for 128 bit vectors, but it's still a win.
> >
>
> After looking at code, generated by LLVM autovectorizer last year, I am more that a somewhat
> doubtful. To say that a year ago they were bad would be an undeserving compliment.
From what I understand from someone who writes this type of code (he's x86 focused so AVX not SVE or NEON) he has to format his code just so to allow it to be properly autovectorized. He learned by trial and error, checking assembly output to figure out what the compiler expects and write his code to match. When the compiler is updated, he has to recheck to verify his carefully crafted code sequences still produce the desired effect.
Sounds like it is better than writing directly in assembly, but not by much. And I doubt most programmers go to such lengths. Most probably write code that could be auto vectorized but is not, and they don't even know there is a lot of performance left on the table.
> Simon Farnsworth (simon.delete@this.farnz.org.uk) on May 14, 2022 5:20 am wrote:
> > SVE2's big advantage over NEON is not wider vectors, but all the compiler-convenience features
> > it has that allow a compiler to be more aggressive about auto-vectorization. For people who
> > are hand-tuning codes for peak performance, SVE2 at 128 bit and NEON are about the same, but
> > SVE2 pulls ahead handily (due to the FFR register and associated instructions) when you're
> > writing "serial" code and relying on the compiler doing something sensible to it.
> >
> > You won't get the same performance this way as you would tuning
> > your code for 128 bit vectors, but it's still a win.
> >
>
> After looking at code, generated by LLVM autovectorizer last year, I am more that a somewhat
> doubtful. To say that a year ago they were bad would be an undeserving compliment.
From what I understand from someone who writes this type of code (he's x86 focused so AVX not SVE or NEON) he has to format his code just so to allow it to be properly autovectorized. He learned by trial and error, checking assembly output to figure out what the compiler expects and write his code to match. When the compiler is updated, he has to recheck to verify his carefully crafted code sequences still produce the desired effect.
Sounds like it is better than writing directly in assembly, but not by much. And I doubt most programmers go to such lengths. Most probably write code that could be auto vectorized but is not, and they don't even know there is a lot of performance left on the table.