M1 GPU has higher FP32 than A-series GPUs

By: Adrian (a.delete@this.acm.org), December 9, 2020 5:28 am
Room: Moderated Discussions
K.K (anon.delete@this.anon.com) on December 9, 2020 3:56 am wrote:
> Adrian (a.delete@this.acm.org) on December 9, 2020 2:16 am wrote:
>
> > It does not matter how you name them, 2 FP16 ALUs that can be aggregated into 1 FP32
> > ALU or 1 FP32 ALU that can be split into 2 FP16 ALUs are the same physical thing.
>
> I understand what you are saying, but I think that it's neither of those things. My speculation
> is that the iPhones have FP16 ALUs that are capable of doing a FP32 operation at half the speed.
> So no ALU fusion, but ALU "reuse" (if it makes any sense). And that M1 upgrades the ALUs to full
> FP32 width, while retaining the ability to do a single FP16 operation (no ALU splitting taking
> place). Maybe M1 can't do FP16 at all use 32-bit precision internally all the way.
>
> At any rate, you will find the benchmarks using interleaved FP32 and FP16 for an M1 and an
> A14 below. As far as I can tell from this, M1 does not have any ALUs capable of higher FP16
> throughput — each combination runs with the same speed. I think this supports my theory
> of FP32 ALUs on M1, without any FP16 splitting. Note that this is different from the results
> I have reported earlier, but that was probably due a bug I have since fixed.
>
> MacBook Pro (M1 with 8-core GPU)
> --------------------------------
> float1 + half1 2516.0
> float1 + half2 2520.0
> float1 + half3 2556.0
> float1 + half4 2562.0
> float2 + half1 2546.0
> float2 + half2 2561.0
> float2 + half3 2560.0
> float2 + half4 2571.0
> float3 + half1 2526.0
> float3 + half2 2539.0
> float3 + half3 2549.0
> float3 + half4 2556.0
> float4 + half1 2531.0
> float4 + half2 2549.0
> float4 + half3 2548.0
> float4 + half4 2552.0
>
>
> iPhone 12 (A14 with 4-core GPU)
> --------------------------------
> float1 + half1 1260.0
> float1 + half2 1260.0
> float1 + half3 1280.0
> float1 + half4 1281.0
> float2 + half1 967.0
> float2 + half2 1280.0
> float2 + half3 1277.0
> float2 + half4 1287.0
> float3 + half1 858.0
> float3 + half2 1070.0
> float3 + half3 1274.0
> float3 + half4 1276.0
> float4 + half1 807.0
> float4 + half2 967.0
> float4 + half3 1127.0
> float4 + half4 1276.0
>
>
> One curious thing is that interleaving FP32 and FP16 on the A14 performs better than expected.
> I would have thought that float+half should result in around 975 GFLOPS (average of
> 650 GFLOPS FP2 and 1300 GFLOPS for FP16), but we get almost full 1300 GFLOPS. I turned -ffast-math
> off, but maybe the compiler manages to sneak some sort of optimisation in there.
>



Thanks for the results, they are interesting.

The comment quoted from Apple does not make sense. M1 has the expected and obvious behavior of executing 1024 operations per cycle, regardless if they are FP32 or FP16. That means that they have FP32/FP16 ALUs, which are simplified by removing the ability to be reconfigured to compute double FP16 operations.

As A14 obviously has that ability, then really the M1 GPU has sacrificed the maximum achievable FP16 throughput in order to achieve a quadruple FP32 throughput without increasing excessively the GPU area.

Because M1 is intended to be used with larger displays, I believe that this Apple design decision was perfectly appropriate for achieving maximal useful improvements in the GPU with minimal costs. They have the NPU for ML applications.


Regarding "So no ALU fusion, but ALU "reuse" (if it makes any sense)", in the case when you have 1:2 speed rate ratios, like for Apple A14, then it is certain that you have a FP32 ALU that is split into 2 FP16 ALUs.

For the adder parts of the ALUs, using either the words "fusion" or "splitting" is irrelevant. You just have 32 1-bit adders and they are reconfigured between FP16 and FP32 just by changing the ways that the carries are propagated. The increased complexity of 1 FP32 adder over 2 FP16 adders (e.g. due to carry lookahead) is negligible.

However, 1 multiplier for FP32 has a larger size than 2 multipliers for FP16 (with more 1-bit adders, up to double, for 24-bit FP32 fractions vs. 11/12-bit FP16 fractions). So, when 2 FPU16 ALUs are fused, the smaller combined FP16 multipliers would have to be used twice, to generate partial products, so you would see a larger speed rate ratio, e.g. of 1:4, between FP32 and FP16, because you have only half as many FP32 ALUs after fusion and each ALU would need at least 2 cycles to generate the result for an FMA or FMUL operation.



< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
M1 GPU microbenchmarks (peak FLOPS)K.K2020/12/05 03:31 AM
  M1 GPU microbenchmarks (peak FLOPS)K.K.2020/12/05 03:57 AM
  M1 GPU microbenchmarks (peak FLOPS)Adrian2020/12/05 04:00 AM
  M1 GPU microbenchmarks (peak FLOPS)Chester2020/12/05 10:39 AM
    M1 GPU microbenchmarks (peak FLOPS)K.K2020/12/06 03:46 AM
  M1 GPU microbenchmarks (peak FLOPS)Maynard Handley2020/12/05 12:12 PM
  M1 GPU has higher FP32 than A-series GPUsAndrei F2020/12/06 03:24 AM
    M1 GPU has higher FP32 than A-series GPUsK.K2020/12/06 03:49 AM
    M1 GPU has higher FP32 than A-series GPUsJeff S.2020/12/06 11:12 PM
      M1 GPU has higher FP32 than A-series GPUsK.K.2020/12/07 02:32 AM
        M1 GPU has higher FP32 than A-series GPUsJeff S.2020/12/07 10:29 PM
          M1 GPU has higher FP32 than A-series GPUsK.K.2020/12/09 01:35 AM
            M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 02:36 AM
              M1 GPU has higher FP32 than A-series GPUsK.K.2020/12/09 02:59 AM
                M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 03:16 AM
                  M1 GPU has higher FP32 than A-series GPUsK.K2020/12/09 04:56 AM
                    M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 05:28 AM
                      M1 GPU has higher FP32 than A-series GPUsK.K2020/12/09 06:01 AM
                        M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 06:37 AM
                          and beyond just the flops...anonymou52020/12/09 02:08 PM
                            and beyond just the flops...Doug S2020/12/09 03:06 PM
                              and beyond just the flops...Maynard Handley2020/12/09 04:05 PM
                                and beyond just the flops...Adrian2020/12/09 05:36 PM
                                and beyond just the flops...anonymou52020/12/09 08:28 PM
                                  and beyond just the flops...Maynard Handley2020/12/09 08:40 PM
                                    and beyond just the flops...anonymou52020/12/10 02:18 AM
                                      and beyond just the flops...anonymou52020/12/10 02:51 AM
                                      and beyond just the flops...Maynard Handley2020/12/10 09:53 AM
                                        and beyond just the flops...Megol2020/12/10 10:57 AM
                                          and beyond just the flops...Maynard Handley2020/12/10 12:16 PM
                                            and beyond just the flops...anonymou52020/12/10 04:12 PM
                                              and beyond just the flops...anonymou52020/12/10 04:24 PM
                                                and beyond just the flops...Maynard Handley2020/12/10 06:18 PM
                                                  and beyond just the flops...Maynard Handley2020/12/10 06:23 PM
                                                    and beyond just the flops...anonymou52020/12/10 09:59 PM
                                                    and beyond just the flops...Gabriele Svelto2020/12/11 01:57 AM
                                                      and beyond just the flops...Dummond D. Slow2020/12/11 09:52 AM
                                                        and beyond just the flops...Maynard Handley2020/12/11 09:57 AM
                                                          and beyond just the flops...Dummond D. Slow2020/12/11 10:00 AM
                                                            and beyond just the flops...Maynard Handley2020/12/11 11:22 AM
                                                              and beyond just the flops...Dummond D. Slow2020/12/12 09:13 AM
                                                                and beyond just the flops...Maynard Handley2020/12/12 01:41 PM
                                                                  and beyond just the flops...Dummond D. Slow2020/12/12 03:35 PM
                                                  and beyond just the flops...Dummond D. Slow2020/12/11 09:41 AM
                                                    and beyond just the flops...Maynard Handley2020/12/11 10:01 AM
                                                      and beyond just the flops...Dummond D. Slow2020/12/11 10:23 AM
                                                        and beyond just the flops...Dummond D. Slow2020/12/11 10:24 AM
                                              and beyond just the flops...Adrian2020/12/11 01:37 AM
                                                and beyond just the flops...Adrian2020/12/11 02:59 AM
                                                and beyond just the flops...Ungo2020/12/11 03:39 AM
                                                and beyond just the flops...Maynard Handley2020/12/11 10:07 AM
                                                  and beyond just the flops...Adrian2020/12/11 11:39 AM
                                                    and beyond just the flops...anonymou52020/12/11 03:01 PM
                                                    and beyond just the flops...David Hess2020/12/12 08:17 AM
                                                      and beyond just the flops...Jukka Larja2020/12/12 11:08 AM
                                                        and beyond just the flops...David Hess2020/12/12 11:36 AM
                                                          and beyond just the flops...Jukka Larja2020/12/12 09:45 PM
                                                            and beyond just the flops...David Hess2020/12/13 06:59 PM
                                                        and beyond just the flops...Adrian2020/12/12 03:11 PM
                                                          and beyond just the flops...Adrian2020/12/12 03:21 PM
                                                            and beyond just the flops...David Hess2020/12/13 07:02 PM
                              and beyond just the flops...useruser2020/12/09 09:38 PM
                                and beyond just the flops...anonymou52020/12/10 02:31 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?