By: Michael S (already5chosen.delete@this.yahoo.com), May 19, 2022 1:14 am
Room: Moderated Discussions
Jan Wassenberg (jan.wassenberg.delete@this.gmail.com) on May 18, 2022 10:29 pm wrote:
> Doug S (foo.delete@this.bar.bar) on May 18, 2022 10:26 am wrote:
> > I don't think it makes sense for ANYONE to invest in AVX512 support for software because of some future
> > assessment it will be table stakes - or even EXIST in the future. The only reason to make that investment
> > is if it pays off TODAY, based on the installed base of AVX512 support in your target market TODAY.
>
> From where I sit, the only investment required for AVX-512 is a few
> extra seconds of compile time plus maybe ~100KiB larger binaries.
Depends on what you want to achieve.
If you are currently utilizing, say, 30% of AVX2+FMA computational ability and want to improve it to utilization of, say, 25% of AVX-512 ability then your method is likely (for same value of 'likely') to work.
But what if your goals are more ambitious?
E.g. you invested significant effort in manual optimization of AVX path. You organized your arrays in 256-bit oriented "hybrid" (==AoSoA) data layout and use of _mm256_xxx() in two inner levels of loops. And you achieved good results, say, 65-70% of peak FLOPs of your AVX2 CPU. Then now you probably want to achieve similar or slightly lower sustained-to-peak FLOPs ratio with AVX-512. Flopping compiler switch is of little help in such case. More likely, of no help at all.
> Doug S (foo.delete@this.bar.bar) on May 18, 2022 10:26 am wrote:
> > I don't think it makes sense for ANYONE to invest in AVX512 support for software because of some future
> > assessment it will be table stakes - or even EXIST in the future. The only reason to make that investment
> > is if it pays off TODAY, based on the installed base of AVX512 support in your target market TODAY.
>
> From where I sit, the only investment required for AVX-512 is a few
> extra seconds of compile time plus maybe ~100KiB larger binaries.
Depends on what you want to achieve.
If you are currently utilizing, say, 30% of AVX2+FMA computational ability and want to improve it to utilization of, say, 25% of AVX-512 ability then your method is likely (for same value of 'likely') to work.
But what if your goals are more ambitious?
E.g. you invested significant effort in manual optimization of AVX path. You organized your arrays in 256-bit oriented "hybrid" (==AoSoA) data layout and use of _mm256_xxx() in two inner levels of loops. And you achieved good results, say, 65-70% of peak FLOPs of your AVX2 CPU. Then now you probably want to achieve similar or slightly lower sustained-to-peak FLOPs ratio with AVX-512. Flopping compiler switch is of little help in such case. More likely, of no help at all.