By: Jeff S. (fakity.delete@this.fake.com), December 6, 2020 11:12 pm
Room: Moderated Discussions
Andrei F (andrei.delete@this.anandtech.com) on December 6, 2020 2:24 am wrote:
> https://twitter.com/gavkar/status/1326582307025637376
>
> Gokhan works for Apple and so this is official disclosure.
>
> It sounds like they added additional FP32-only units on the M1 GPU, alongside
> the FP32/double-rate-FP16 capable units of prior generations.
https://twitter.com/gavkar/status/1326582307025637376
My expectation would be that dense low-precision arithmetic would be relegated to the "neural engine" hardwired matmul accelerator and packed fp16 in the GPU for more general purpose vector programming would be sacrificed.
> https://twitter.com/gavkar/status/1326582307025637376
>
> Gokhan works for Apple and so this is official disclosure.
>
> It sounds like they added additional FP32-only units on the M1 GPU, alongside
> the FP32/double-rate-FP16 capable units of prior generations.
https://twitter.com/gavkar/status/1326582307025637376
FP32 ALU rate is half of FP16 rate on A14 (and earlier chips). That has not changed on A14. F32 ALU rate relative to F16 increased on M1.This language strikes me as very odd. It certainly sounds to me more like packed fp16 was just dropped. If double-issue fp32 were added (supplying 6 operands to 2 FMA units), why not still support packed fp16 for 4 per clock per VRF port set?
My expectation would be that dense low-precision arithmetic would be relegated to the "neural engine" hardwired matmul accelerator and packed fp16 in the GPU for more general purpose vector programming would be sacrificed.