By: Charlie Burnes (charlie.burnes.delete@this.no-spam.com), May 22, 2022 7:20 am
Room: Moderated Discussions
Thank you for your comments.
> It may happen that some algorithm cannot be SIMD width agnostic when also wanting maximum performance,
> because the associated data structures must have different layouts, depending on the SIMD width.
That is the situation I have except I do not need the absolute maximum performance for every SIMD width. I would like the best I can reasonably achieve for AVX-512 and I would like it to not be unusably slow for other SIMD widths.
> A single generic program, having the width as a parameter, should suffice. While this is more complicated than being able
> to ignore the width, I do not find it as the greatest difficulty, as when targeting multiple ISA variants with a single program
I think having a parameterized width varies in difficulty depending the problem. In some cases, it is not difficult. In other cases, it is too difficult to be practical.
> Writing a program for 512-bit vectors and having it converted automatically to other register widths does not
> seem feasible in the general case, because the compiler cannot always know where in the code and in the
> data structures assumptions about the register width have been used.
I don’t need a solution for the general case. I just need a solution that works for my particular case. Jan Wessenberg suggested an approach using his Highway software that I think will work on another post.
> It may happen that some algorithm cannot be SIMD width agnostic when also wanting maximum performance,
> because the associated data structures must have different layouts, depending on the SIMD width.
That is the situation I have except I do not need the absolute maximum performance for every SIMD width. I would like the best I can reasonably achieve for AVX-512 and I would like it to not be unusably slow for other SIMD widths.
> A single generic program, having the width as a parameter, should suffice. While this is more complicated than being able
> to ignore the width, I do not find it as the greatest difficulty, as when targeting multiple ISA variants with a single program
I think having a parameterized width varies in difficulty depending the problem. In some cases, it is not difficult. In other cases, it is too difficult to be practical.
> Writing a program for 512-bit vectors and having it converted automatically to other register widths does not
> seem feasible in the general case, because the compiler cannot always know where in the code and in the
> data structures assumptions about the register width have been used.
I don’t need a solution for the general case. I just need a solution that works for my particular case. Jan Wessenberg suggested an approach using his Highway software that I think will work on another post.