By: -.- (blarg.delete@this.mailinator.com), May 18, 2022 3:41 am
Room: Moderated Discussions
From what I can tell, SVE's greatest advantages for auto-vectorization over AVX-512 would be:
- FFR
- WHILELT being a single instruction
Lack of FFR might be solvable with a prologue and aligned loads in the loop, but this imposes a cost which the compiler might not know is worthwhile (without profiling). Still, FFR is an unmatched advantage, though I don't know how beneficial it'd be for auto-vectorization in general (my guess is mostly for null-terminated strings).
WHILELT can be emulated on AVX-512, but could be comparatively costly. However, if it's the difference between "costly" vectorization and no vectorization, particularly since you're guaranteed up to 16x 32b lanes on the former, it's almost certainly worth it. So I don't see this as a huge benefit for auto-vectorization.
I don't see SVE's arbitrary sized vectors helping at all here - it's useful in other cases, but for auto-vectorizers, it's likely just additional complexity.
It feels to me like SVE helps the cause a little, but not by much. It adds a few small things, but most cases of auto-vectorization failures I've seen require substantial compiler improvements that these won't really do much to help.
(and there'll still be a tonne of developers not knowing to add
- FFR
- WHILELT being a single instruction
Lack of FFR might be solvable with a prologue and aligned loads in the loop, but this imposes a cost which the compiler might not know is worthwhile (without profiling). Still, FFR is an unmatched advantage, though I don't know how beneficial it'd be for auto-vectorization in general (my guess is mostly for null-terminated strings).
WHILELT can be emulated on AVX-512, but could be comparatively costly. However, if it's the difference between "costly" vectorization and no vectorization, particularly since you're guaranteed up to 16x 32b lanes on the former, it's almost certainly worth it. So I don't see this as a huge benefit for auto-vectorization.
I don't see SVE's arbitrary sized vectors helping at all here - it's useful in other cases, but for auto-vectorizers, it's likely just additional complexity.
It feels to me like SVE helps the cause a little, but not by much. It adds a few small things, but most cases of auto-vectorization failures I've seen require substantial compiler improvements that these won't really do much to help.
(and there'll still be a tonne of developers not knowing to add
-ffast-math
to their compiler flags, so it may be all moot)