By: Charlie Burnes (charlie.burnes.delete@this.no-spam.com), May 22, 2022 7:52 am
Room: Moderated Discussions
> For example, on SVE Lanes() instead maps to svcnt. Now the "governs" comes into play: Lanes() is a function of the hardware
> lane count, N as a user-specified upper bound, or the shift count which on SVE can be a fraction such as 1/4 of the hardware
> lane count. For example, CappedTag has N=2 which is smaller than the hardware lane count but the ops behave as if you actually
> had a 64-bit vector. But CappedTag will have vec_bytes/4 lanes, which SVE guarantees is at most 64. So here the N can influence,
> but is not identical to what Lanes returns. Does that make sense?
Thank you for trying to explain that. I’m still confused because I’m not familiar with SVE. I don’t need to be concerned with SVE or SVE2 unless Apple includes them in the M2 and removes NEON.
I like the approach you suggested on another post. I need to learn enough about Highway to do that. I also like the idea you suggested about letting the compiler unroll loops so it can use the extra registers provided by AVX-512 over what is available with AVX2.
> lane count, N as a user-specified upper bound, or the shift count which on SVE can be a fraction such as 1/4 of the hardware
> lane count. For example, CappedTag has N=2 which is smaller than the hardware lane count but the ops behave as if you actually
> had a 64-bit vector. But CappedTag will have vec_bytes/4 lanes, which SVE guarantees is at most 64. So here the N can influence,
> but is not identical to what Lanes returns. Does that make sense?
Thank you for trying to explain that. I’m still confused because I’m not familiar with SVE. I don’t need to be concerned with SVE or SVE2 unless Apple includes them in the M2 and removes NEON.
I like the approach you suggested on another post. I need to learn enough about Highway to do that. I also like the idea you suggested about letting the compiler unroll loops so it can use the extra registers provided by AVX-512 over what is available with AVX2.