By: --- (---.delete@this.redheron.com), May 17, 2022 3:01 pm
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 17, 2022 12:26 pm wrote:
> > Is there some way to automatically convert SVE2 code to slower
> > NEON code so that you don’t have to write two versions?
>
> > If code was autovectorized, it would be easy to make two different executables, one for SVE2 and one
> > for NEON. If SVE2 code was written by hand, there would be no
> > way to run the SVE2 code on an older device without SVE2.
> > No one wants to write two different versions by hand, one for SVE2 and one for NEON.
>
> Agreed. github.com/google/highway helps with this - it provides "portable intrinsics" (wrapper functions
> that call NEON or SVE[2] intrinsics) that you call, and supports compiling your code once per platform
> and then dispatching to the best available one either at compile-time or runtime.
>
> This works on x86 but arm_neon/arm_sve.h currently require compiler flags to be set before including
> them. Thus the best we can currently do for Arm (until the compiler is updated to lift this limitation)
> is to compile the same source file multiple times with different compiler flags.
>
> For an example of this in action, see vqsort (vectorized quicksort): https://arxiv.org/abs/2205.05982
>
> Happy to discuss via Github issues or email.
You can see how this works (at least for some people) doing real (stunning!) work on real projects here:
https://blog.yiningkarlli.com/2021/09/neon-vs-sse.html
Some people may also find this interesting:
https://blog.yiningkarlli.com/2021/10/takua-on-m1-max.html
> > Is there some way to automatically convert SVE2 code to slower
> > NEON code so that you don’t have to write two versions?
>
> > If code was autovectorized, it would be easy to make two different executables, one for SVE2 and one
> > for NEON. If SVE2 code was written by hand, there would be no
> > way to run the SVE2 code on an older device without SVE2.
> > No one wants to write two different versions by hand, one for SVE2 and one for NEON.
>
> Agreed. github.com/google/highway helps with this - it provides "portable intrinsics" (wrapper functions
> that call NEON or SVE[2] intrinsics) that you call, and supports compiling your code once per platform
> and then dispatching to the best available one either at compile-time or runtime.
>
> This works on x86 but arm_neon/arm_sve.h currently require compiler flags to be set before including
> them. Thus the best we can currently do for Arm (until the compiler is updated to lift this limitation)
> is to compile the same source file multiple times with different compiler flags.
>
> For an example of this in action, see vqsort (vectorized quicksort): https://arxiv.org/abs/2205.05982
>
> Happy to discuss via Github issues or email.
You can see how this works (at least for some people) doing real (stunning!) work on real projects here:
https://blog.yiningkarlli.com/2021/09/neon-vs-sse.html
Some people may also find this interesting:
https://blog.yiningkarlli.com/2021/10/takua-on-m1-max.html