M1 GPU has higher FP32 than A-series GPUs

By: K.K. (anon.delete@this.anon.com), December 9, 2020 1:35 am
Room: Moderated Discussions
Jeff S. (fakity.delete@this.fake.com) on December 7, 2020 9:29 pm wrote:
> K.K. (anon.delete@this.anon.com) on December 7, 2020 1:32 am wrote:
> > I would hypothesize that it's the other way around and that A14 and other mobile GPUs simply take
> > longer to execute FP32 operations. My tests so far seem to suggest that both FP32 and FP16 operations
> > on M1 can be issued every cycle, with execution latency of two cycles. Maybe mobile GPUs need four
> > cycles instead to do a single FP32 operation and Apple has tweaked the M1 to bring the FP32 performance
> > up to the level of the FP16? I will run some tests on an A14 later this week.
>
> The main reason the "fp32 got better" angle sounds difficult to believe is that GPUs are typically bottlenecked
> by operand supply. This is the whole reason ML/neural/tensor engines throw power and area at FMAs hardwired
> to do just GEMM slice acceleration: you just can't get competitive density if you need two or three RF
> arrays and ports for every FMA unit, so you chain everything up to do that one thing.
>
> An SIMD FPU that has the same FLOPS for fp32 as fp16 is just wasting half of its scarcest resource
> in the latter case. While packed fp16 doesn't come for free in power/area (or even programming
> model simplicity), it's still a small marginal cost over the fp32 unit and the RF.
>
> The alternative is that Apple pulled a consumer Ampere and added double fp32 issue per SIMT thread.
> Given how many architectural generations it has taken Nvidia to work towards this (moving from
> quad single-ported RF banks to dual, dual-ported, trying dual int32/fp32 issue, etc.) and still
> taking a huge efficiency hit in general purpose programs' fp32 utilization vs. theoretical limits,
> I think it's unrealistic to imagine Apple did it all in one step perfectly somehow.

Thanks for explaining this! I know absolutely nothing about circuitry implementation, so your explanation was enlightening.

At any rate, I managed to find an iPhone 12 to run my benchmark, and I have also fixed a bug with timings. Here are the results:

iPhone 12 (A14 with 4-core GPU)
--------------------------------
float1 638.0
float2 645.0
float3 645.0
float4 646.0
half1 1282.0
half2 1297.0
half3 1296.0
half4 1298.0

MacBook Pro (M1 with 8-core GPU)
--------------------------------
float1 2458.0
float2 2528.0
float3 2508.0
float4 2522.0
half1 2569.0
half2 2592.0
half3 2594.0
half4 2597.0

These results suggest that Apple indeed managed to improve the throughput of FP32 without sacrificing anything else. Also, I don't see any evidence that they implement dual-issue or anything comparable — it's just a regular FMA per cycle on 1024 (M1) or 512 (A14) ALUs running at around 1.3ghz. The A14 simply has half the FP32 throughput. My amateur guess would be that Apple's ALUs are historically "native" FP16 but also capable of doing a FP32 operation at half the speed (maybe two cycles?). In M1, Apple appears to have "upgraded" the ALUs with single-cycle FP32 capability. I have no idea what hardware changes are necessary to achieve this. I wouldn't be surprised if the next iteration of Apple GPU cores brings dual-rate-per-clock FP16 capability just like current Nvidia and AMD GPUs.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
M1 GPU microbenchmarks (peak FLOPS)K.K2020/12/05 03:31 AM
  M1 GPU microbenchmarks (peak FLOPS)K.K.2020/12/05 03:57 AM
  M1 GPU microbenchmarks (peak FLOPS)Adrian2020/12/05 04:00 AM
  M1 GPU microbenchmarks (peak FLOPS)Chester2020/12/05 10:39 AM
    M1 GPU microbenchmarks (peak FLOPS)K.K2020/12/06 03:46 AM
  M1 GPU microbenchmarks (peak FLOPS)Maynard Handley2020/12/05 12:12 PM
  M1 GPU has higher FP32 than A-series GPUsAndrei F2020/12/06 03:24 AM
    M1 GPU has higher FP32 than A-series GPUsK.K2020/12/06 03:49 AM
    M1 GPU has higher FP32 than A-series GPUsJeff S.2020/12/06 11:12 PM
      M1 GPU has higher FP32 than A-series GPUsK.K.2020/12/07 02:32 AM
        M1 GPU has higher FP32 than A-series GPUsJeff S.2020/12/07 10:29 PM
          M1 GPU has higher FP32 than A-series GPUsK.K.2020/12/09 01:35 AM
            M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 02:36 AM
              M1 GPU has higher FP32 than A-series GPUsK.K.2020/12/09 02:59 AM
                M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 03:16 AM
                  M1 GPU has higher FP32 than A-series GPUsK.K2020/12/09 04:56 AM
                    M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 05:28 AM
                      M1 GPU has higher FP32 than A-series GPUsK.K2020/12/09 06:01 AM
                        M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 06:37 AM
                          and beyond just the flops...anonymou52020/12/09 02:08 PM
                            and beyond just the flops...Doug S2020/12/09 03:06 PM
                              and beyond just the flops...Maynard Handley2020/12/09 04:05 PM
                                and beyond just the flops...Adrian2020/12/09 05:36 PM
                                and beyond just the flops...anonymou52020/12/09 08:28 PM
                                  and beyond just the flops...Maynard Handley2020/12/09 08:40 PM
                                    and beyond just the flops...anonymou52020/12/10 02:18 AM
                                      and beyond just the flops...anonymou52020/12/10 02:51 AM
                                      and beyond just the flops...Maynard Handley2020/12/10 09:53 AM
                                        and beyond just the flops...Megol2020/12/10 10:57 AM
                                          and beyond just the flops...Maynard Handley2020/12/10 12:16 PM
                                            and beyond just the flops...anonymou52020/12/10 04:12 PM
                                              and beyond just the flops...anonymou52020/12/10 04:24 PM
                                                and beyond just the flops...Maynard Handley2020/12/10 06:18 PM
                                                  and beyond just the flops...Maynard Handley2020/12/10 06:23 PM
                                                    and beyond just the flops...anonymou52020/12/10 09:59 PM
                                                    and beyond just the flops...Gabriele Svelto2020/12/11 01:57 AM
                                                      and beyond just the flops...Dummond D. Slow2020/12/11 09:52 AM
                                                        and beyond just the flops...Maynard Handley2020/12/11 09:57 AM
                                                          and beyond just the flops...Dummond D. Slow2020/12/11 10:00 AM
                                                            and beyond just the flops...Maynard Handley2020/12/11 11:22 AM
                                                              and beyond just the flops...Dummond D. Slow2020/12/12 09:13 AM
                                                                and beyond just the flops...Maynard Handley2020/12/12 01:41 PM
                                                                  and beyond just the flops...Dummond D. Slow2020/12/12 03:35 PM
                                                  and beyond just the flops...Dummond D. Slow2020/12/11 09:41 AM
                                                    and beyond just the flops...Maynard Handley2020/12/11 10:01 AM
                                                      and beyond just the flops...Dummond D. Slow2020/12/11 10:23 AM
                                                        and beyond just the flops...Dummond D. Slow2020/12/11 10:24 AM
                                              and beyond just the flops...Adrian2020/12/11 01:37 AM
                                                and beyond just the flops...Adrian2020/12/11 02:59 AM
                                                and beyond just the flops...Ungo2020/12/11 03:39 AM
                                                and beyond just the flops...Maynard Handley2020/12/11 10:07 AM
                                                  and beyond just the flops...Adrian2020/12/11 11:39 AM
                                                    and beyond just the flops...anonymou52020/12/11 03:01 PM
                                                    and beyond just the flops...David Hess2020/12/12 08:17 AM
                                                      and beyond just the flops...Jukka Larja2020/12/12 11:08 AM
                                                        and beyond just the flops...David Hess2020/12/12 11:36 AM
                                                          and beyond just the flops...Jukka Larja2020/12/12 09:45 PM
                                                            and beyond just the flops...David Hess2020/12/13 06:59 PM
                                                        and beyond just the flops...Adrian2020/12/12 03:11 PM
                                                          and beyond just the flops...Adrian2020/12/12 03:21 PM
                                                            and beyond just the flops...David Hess2020/12/13 07:02 PM
                              and beyond just the flops...useruser2020/12/09 09:38 PM
                                and beyond just the flops...anonymou52020/12/10 02:31 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?