By: rwessel (rwessel.delete@this.yahoo.com), May 17, 2022 1:42 am
Room: Moderated Discussions
--- (---.delete@this.redheron.com) on May 16, 2022 10:22 pm wrote:
> Doug S (foo.delete@this.bar.bar) on May 15, 2022 10:50 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on May 14, 2022 12:27 pm wrote:
> > > Simon Farnsworth (simon.delete@this.farnz.org.uk) on May 14, 2022 5:20 am wrote:
> > > > SVE2's big advantage over NEON is not wider vectors, but all the compiler-convenience features
> > > > it has that allow a compiler to be more aggressive about auto-vectorization. For people who
> > > > are hand-tuning codes for peak performance, SVE2 at 128 bit and NEON are about the same, but
> > > > SVE2 pulls ahead handily (due to the FFR register and associated instructions) when you're
> > > > writing "serial" code and relying on the compiler doing something sensible to it.
> > > >
> > > > You won't get the same performance this way as you would tuning
> > > > your code for 128 bit vectors, but it's still a win.
> > > >
> > >
> > > After looking at code, generated by LLVM autovectorizer last year, I am more that a somewhat
> > > doubtful. To say that a year ago they were bad would be an undeserving compliment.
> >
> >
> > From what I understand from someone who writes this type
> > of code (he's x86 focused so AVX not SVE or NEON) he
> > has to format his code just so to allow it to be properly
> > autovectorized. He learned by trial and error, checking
> > assembly output to figure out what the compiler expects and
> > write his code to match. When the compiler is updated,
> > he has to recheck to verify his carefully crafted code sequences still produce the desired effect.
> >
> > Sounds like it is better than writing directly in assembly, but not by much. And I doubt
> > most programmers go to such lengths. Most probably write code that could be auto vectorized
> > but is not, and they don't even know there is a lot of performance left on the table.
>
> This may be true, but the argument:
>
> - existing vector ISAs are a poor match for compilers
> THEREFORE
> - a new vector ISA, explicitly designed with all this accumulated experience in mind, and
> by people who are well aware of why the compilers have difficulty, will be just as bad
>
> seems rather strange...
Alternatively, after half a century of supposed ISA and compiler improvements, which have largely utterly failed to allow broad vectorization (or the compiler extraction of other types of ILP), why anyone would think that this time will be any different. This was, in case you don't remember, the explicit promise made for AVX512 as well, for a recent example. Heck, AVX512 was supposed to allow broad vectorization of integer code as well.
> Doug S (foo.delete@this.bar.bar) on May 15, 2022 10:50 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on May 14, 2022 12:27 pm wrote:
> > > Simon Farnsworth (simon.delete@this.farnz.org.uk) on May 14, 2022 5:20 am wrote:
> > > > SVE2's big advantage over NEON is not wider vectors, but all the compiler-convenience features
> > > > it has that allow a compiler to be more aggressive about auto-vectorization. For people who
> > > > are hand-tuning codes for peak performance, SVE2 at 128 bit and NEON are about the same, but
> > > > SVE2 pulls ahead handily (due to the FFR register and associated instructions) when you're
> > > > writing "serial" code and relying on the compiler doing something sensible to it.
> > > >
> > > > You won't get the same performance this way as you would tuning
> > > > your code for 128 bit vectors, but it's still a win.
> > > >
> > >
> > > After looking at code, generated by LLVM autovectorizer last year, I am more that a somewhat
> > > doubtful. To say that a year ago they were bad would be an undeserving compliment.
> >
> >
> > From what I understand from someone who writes this type
> > of code (he's x86 focused so AVX not SVE or NEON) he
> > has to format his code just so to allow it to be properly
> > autovectorized. He learned by trial and error, checking
> > assembly output to figure out what the compiler expects and
> > write his code to match. When the compiler is updated,
> > he has to recheck to verify his carefully crafted code sequences still produce the desired effect.
> >
> > Sounds like it is better than writing directly in assembly, but not by much. And I doubt
> > most programmers go to such lengths. Most probably write code that could be auto vectorized
> > but is not, and they don't even know there is a lot of performance left on the table.
>
> This may be true, but the argument:
>
> - existing vector ISAs are a poor match for compilers
> THEREFORE
> - a new vector ISA, explicitly designed with all this accumulated experience in mind, and
> by people who are well aware of why the compilers have difficulty, will be just as bad
>
> seems rather strange...
Alternatively, after half a century of supposed ISA and compiler improvements, which have largely utterly failed to allow broad vectorization (or the compiler extraction of other types of ILP), why anyone would think that this time will be any different. This was, in case you don't remember, the explicit promise made for AVX512 as well, for a recent example. Heck, AVX512 was supposed to allow broad vectorization of integer code as well.