By: K.K. (anon.delete@this.anon.com), December 5, 2020 3:57 am
Room: Moderated Discussions
K.K (anon.delete@this.anon.coom) on December 5, 2020 2:31 am wrote:
> On the
> M1 however, the same kernel only yields 1.2 - 1.3 TFLOPS, same as clpeak and half of what
> Apple is claiming. Note that M1 has 1024 GPU ALUs, so this is consistent with 1 FLOP per
> ALU per clock assuming the later is around 1.2ghz (which again is realistic).
>
Quick clarification here: 1 FLOP/cycle would basically mean that a dependent FMA instruction is executed every two cycles. I count each FMA instruction as 2 FLOP.
> On the
> M1 however, the same kernel only yields 1.2 - 1.3 TFLOPS, same as clpeak and half of what
> Apple is claiming. Note that M1 has 1024 GPU ALUs, so this is consistent with 1 FLOP per
> ALU per clock assuming the later is around 1.2ghz (which again is realistic).
>
Quick clarification here: 1 FLOP/cycle would basically mean that a dependent FMA instruction is executed every two cycles. I count each FMA instruction as 2 FLOP.