By: Anon1 (Anon.delete@this.gmail.com), May 14, 2022 12:53 am
Room: Moderated Discussions
Doug S (foo.delete@this.bar.bar) on May 13, 2022 9:28 pm wrote:
>
> What about NEON, which solves the exact same problem? SVE2 allows wider vectors, but unless
> you actually ship with significantly wider vectors what's the difference between e.g. 2x256b
> SVE2 and 4x128b NEON? Yeah SVE2 is where the development is taking place now, but most
> of the new instructions are 'AI' related stuff Apple supports via the NPU.
>
> Now if they plan to offer 512 bit wide SVE2 on M2 Max for the higher end stuff while keeping it at
> a more reasonable 128 or 256 bit width for phones, tablets and lower end Macs maybe it makes sense.
As Maynard already mention SVE2 simplifies auto-vectorization and also is overall more flexible (masking etc).
I am not sure whether there is any need for Apple to increase SIMD width, having four independent FP/SIMD units gives them more flexibility across various workloads and I suppose they would want to retain it. For them, going 6x 128bit FP units probably beats going 4c 256bit units. Not to mention that SVE makes it easier to schedule operations over multiple units with its looping etc. constructs.
>
> What about NEON, which solves the exact same problem? SVE2 allows wider vectors, but unless
> you actually ship with significantly wider vectors what's the difference between e.g. 2x256b
> SVE2 and 4x128b NEON? Yeah SVE2 is where the development is taking place now, but most
> of the new instructions are 'AI' related stuff Apple supports via the NPU.
>
> Now if they plan to offer 512 bit wide SVE2 on M2 Max for the higher end stuff while keeping it at
> a more reasonable 128 or 256 bit width for phones, tablets and lower end Macs maybe it makes sense.
As Maynard already mention SVE2 simplifies auto-vectorization and also is overall more flexible (masking etc).
I am not sure whether there is any need for Apple to increase SIMD width, having four independent FP/SIMD units gives them more flexibility across various workloads and I suppose they would want to retain it. For them, going 6x 128bit FP units probably beats going 4c 256bit units. Not to mention that SVE makes it easier to schedule operations over multiple units with its looping etc. constructs.