By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), May 20, 2022 10:54 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on May 20, 2022 5:51 am wrote:
> -.- (blarg.delete@this.mailinator.com) on May 20, 2022 3:55 am wrote:
> > Why not something like:
> >
> -.- (blarg.delete@this.mailinator.com) on May 20, 2022 3:55 am wrote:
> > Why not something like:
> >
#ifdef __AVX512F__
> > # define _mm(f) _mm512_##f
> > # define __mfloat __m512
> > # include "your-code-file.c"
Oh interesting, that would work in C as well. We do something similar (re-including the user code) but rely on C++ function overloading. More information in case you're interested: https://github.com/google/highway/blob/master/g3doc/impl_details.md
> Retains 100% ISA functionality unlike other SIMD abstraction layers.
This is a bit optimistic :) For example, anything involving masks on AVX-512 is different.
> Now, after reading the rest of Jan's posts, I am starting to believe that in his case it is indeed that simple,
> but only because he and his co-workers turned potentially compute-bounded problem into LS bounded, losing in
> the process factor of 2 of potential performance (2 at best, if inner-loop's data set still fits in L1D, otherwise
> the factor is bigger than 2) for sake of portability and of simplification their own work.
Yes, it's only that simple because we have invested in the infrastructure to make it so :)
Agreed, engineering time is usually a major constraint, and portability is a requirement. I'm not sure why you see a >= 2x slowdown, though:
1) compared with not having SIMD (on platforms where we couldn't justify hand-written arch-specific code), any kind of SIMD is a big win.
2) Porting existing x86 intrinsics to Highway has been at worst perf-neutral, and often better (when we can transparently use wider vectors, such as in the equivalent of strchr).