By: Andrei F (andrei.delete@this.anandtech.com), December 6, 2020 3:24 am
Room: Moderated Discussions
K.K (anon.delete@this.anon.coom) on December 5, 2020 2:31 am wrote: I still need to test it on my iPhone. I suspect there might
> be a flaw in my methodology as I have difficulty believing that Apple doesn't have faster FP16. Maybe I should
> try something else, like computing a FP32 and FP16 chain simultaneously?
>
> To summarise:
>
> 1. Peak compute performance on M1 is only reached if multiple dependency chains are
> interleaved. I do not know whether this means that each ALU is capable of doing two
> FP32 operations per clock or whether there is some sort of pipeline effect.
>
> 2. M1 seems to run FP32 and FP16 operations at the same speed, at least
> in my simple example. I am very sceptical about this result.
>
> Do you have any suggestions how I can improve this test and what else can I do?
>
>
https://twitter.com/gavkar/status/1326582307025637376
Gokhan works for Apple and so this is official disclosure.
It sounds like they added additional FP32-only units on the M1 GPU, alongside the FP32/double-rate-FP16 capable units of prior generations.
> be a flaw in my methodology as I have difficulty believing that Apple doesn't have faster FP16. Maybe I should
> try something else, like computing a FP32 and FP16 chain simultaneously?
>
> To summarise:
>
> 1. Peak compute performance on M1 is only reached if multiple dependency chains are
> interleaved. I do not know whether this means that each ALU is capable of doing two
> FP32 operations per clock or whether there is some sort of pipeline effect.
>
> 2. M1 seems to run FP32 and FP16 operations at the same speed, at least
> in my simple example. I am very sceptical about this result.
>
> Do you have any suggestions how I can improve this test and what else can I do?
>
>
https://twitter.com/gavkar/status/1326582307025637376
Gokhan works for Apple and so this is official disclosure.
It sounds like they added additional FP32-only units on the M1 GPU, alongside the FP32/double-rate-FP16 capable units of prior generations.