By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), May 22, 2022 12:06 pm
Room: Moderated Discussions
Charlie Burnes (charlie.burnes.delete@this.no-spam.com) on May 22, 2022 6:30 am wrote:
> It’s not possible because the SIMD width is mapped to a rectangle of values and a very expensive computation
> has to be done to compute some constants that correspond to each value in that rectangle.
I see, thanks for sharing the background.
> I don’t need to worry about SVE or SVE2 unless Apple includes them in the
> M2 and removes NEON. I doubt Apple will remove NEON because that would break
> existing applications.
Makes sense. It's also possible to always use only 128 bits in SVE.
> > two/four vectors. You could copy/adapt the Highway implementations of Vec256 and Vec128, or simply
> > build on top of them and define the operations you want using two/four calls to the Highway ops.
>
> That sounds like a great idea. Thank you! I saw the HWY_EMU128 target mentioned in the Highway
> Implementation Details document but it sounds like you are referring to something different here.
> I didn’t see Vec128 or Vec256 mentioned anywhere in the Highway docs I have read so far. I did
> a Google search for Vec256 site:github.com/google/highway and I got no matching documents.
You're welcome :) HWY_EMU128 is a scalar fallback, or if you want to try autovectorization (not recommended). Vec128 and Vec256 are the internal-only classes defined in hwy/ops/x86_128-inl.h (and x86_256-inl.h).
What I'd recommend is:
template
struct Vec256x2 {
Vec256 v0; // This assumes we're AVX2-specific.
Vec256 v1;
};
// For NEON, you'd want Vec128 v0,v1,v2,v3.
Then you can re-define all the ops you require:
Vec256x2 Add(Vec256x2 a, Vec256x2 b) {
a.v0 = Add(a.v0, b.v0);
a.v1 = Add(a.v1, b.v1);
return a;
}
> It’s not possible because the SIMD width is mapped to a rectangle of values and a very expensive computation
> has to be done to compute some constants that correspond to each value in that rectangle.
I see, thanks for sharing the background.
> I don’t need to worry about SVE or SVE2 unless Apple includes them in the
> M2 and removes NEON. I doubt Apple will remove NEON because that would break
> existing applications.
Makes sense. It's also possible to always use only 128 bits in SVE.
> > two/four vectors. You could copy/adapt the Highway implementations of Vec256 and Vec128, or simply
> > build on top of them and define the operations you want using two/four calls to the Highway ops.
>
> That sounds like a great idea. Thank you! I saw the HWY_EMU128 target mentioned in the Highway
> Implementation Details document but it sounds like you are referring to something different here.
> I didn’t see Vec128 or Vec256 mentioned anywhere in the Highway docs I have read so far. I did
> a Google search for Vec256 site:github.com/google/highway and I got no matching documents.
You're welcome :) HWY_EMU128 is a scalar fallback, or if you want to try autovectorization (not recommended). Vec128 and Vec256 are the internal-only classes defined in hwy/ops/x86_128-inl.h (and x86_256-inl.h).
What I'd recommend is:
template
struct Vec256x2 {
Vec256 v0; // This assumes we're AVX2-specific.
Vec256 v1;
};
// For NEON, you'd want Vec128 v0,v1,v2,v3.
Then you can re-define all the ops you require:
Vec256x2 Add(Vec256x2 a, Vec256x2 b) {
a.v0 = Add(a.v0, b.v0);
a.v1 = Add(a.v1, b.v1);
return a;
}