By: Jan Wassenberg (jan.wassenberg.delete@this.gmail.com), May 21, 2022 11:16 pm
Room: Moderated Discussions
Charlie Burnes (charlie.burnes.delete@this.no-spam.com) on May 21, 2022 10:11 pm wrote:
> The main thing I’m struggling with is I can’t write my code in a way that is SIMD width agnostic.
I'm curious why that is not possible?
> I would like to write it using 512-bit vectors and have some way to automatically make it run on machines
> with 256-bit and 128-bit vectors. That does not seem to be possible with Highway, as far as I can determine.
> There is a way to force Highway to use 128-bit vectors
Right. Given that you plan to use NEON, would SVE ever come into play? Because that (or RVV) could be 128 or 512 bit or something else, discoverable only at runtime, I do not believe it is feasible to write software in terms of 512 bit vectors and have it transparently map to the actual vectors. Where does the 512 number come from?
But perhaps you only care about older HW with fixed-size vectors and it would certainly be feasible to define a Vec256x2 and Vec128x4 class that implements 512 bits using two/four vectors. You could copy/adapt the Highway implementations of Vec256 and Vec128, or simply build on top of them and define the operations you want using two/four calls to the Highway ops.
> The main thing I’m struggling with is I can’t write my code in a way that is SIMD width agnostic.
I'm curious why that is not possible?
> I would like to write it using 512-bit vectors and have some way to automatically make it run on machines
> with 256-bit and 128-bit vectors. That does not seem to be possible with Highway, as far as I can determine.
> There is a way to force Highway to use 128-bit vectors
Right. Given that you plan to use NEON, would SVE ever come into play? Because that (or RVV) could be 128 or 512 bit or something else, discoverable only at runtime, I do not believe it is feasible to write software in terms of 512 bit vectors and have it transparently map to the actual vectors. Where does the 512 number come from?
But perhaps you only care about older HW with fixed-size vectors and it would certainly be feasible to define a Vec256x2 and Vec128x4 class that implements 512 bits using two/four vectors. You could copy/adapt the Highway implementations of Vec256 and Vec128, or simply build on top of them and define the operations you want using two/four calls to the Highway ops.