By: Simon Farnsworth (simon.delete@this.farnz.org.uk), May 16, 2022 4:13 am
Room: Moderated Discussions
Jörn Engel (joern.delete@this.purestorage.com) on May 15, 2022 11:34 pm wrote:
> Simon Farnsworth (simon.delete@this.farnz.org.uk) on May 14, 2022 5:20 am wrote:
> >
> > SVE2's big advantage over NEON is not wider vectors, but all the compiler-convenience features
> > it has that allow a compiler to be more aggressive about auto-vectorization. For people who
> > are hand-tuning codes for peak performance, SVE2 at 128 bit and NEON are about the same, but
> > SVE2 pulls ahead handily (due to the FFR register and associated instructions) when you're
> > writing "serial" code and relying on the compiler doing something sensible to it.
>
> Why do you think that tail handling is less of a problem for manually written vector code?
>
> Asking for a friend.
I think it's as hard both ways round - my assertion is that humans are a lot better at writing vector code than compilers at the moment, rather than that the problem space is easier when the code is manually written.
In particular, compilers are basically forced to vectorize code as "prologue to get to vector alignment", "wide body", "epilogue to handle tail shorter than a vector", because auto-vectorization can't make assumptions about input data. Humans can rewrite the code so that the prologue and epilogue aren't needed, or are handled by the caller when needed.
Basically, I think that most people who manually vectorize code are cleverer than the auto-vectorization pass in compilers :-)
> Simon Farnsworth (simon.delete@this.farnz.org.uk) on May 14, 2022 5:20 am wrote:
> >
> > SVE2's big advantage over NEON is not wider vectors, but all the compiler-convenience features
> > it has that allow a compiler to be more aggressive about auto-vectorization. For people who
> > are hand-tuning codes for peak performance, SVE2 at 128 bit and NEON are about the same, but
> > SVE2 pulls ahead handily (due to the FFR register and associated instructions) when you're
> > writing "serial" code and relying on the compiler doing something sensible to it.
>
> Why do you think that tail handling is less of a problem for manually written vector code?
>
> Asking for a friend.
I think it's as hard both ways round - my assertion is that humans are a lot better at writing vector code than compilers at the moment, rather than that the problem space is easier when the code is manually written.
In particular, compilers are basically forced to vectorize code as "prologue to get to vector alignment", "wide body", "epilogue to handle tail shorter than a vector", because auto-vectorization can't make assumptions about input data. Humans can rewrite the code so that the prologue and epilogue aren't needed, or are handled by the caller when needed.
Basically, I think that most people who manually vectorize code are cleverer than the auto-vectorization pass in compilers :-)