M1 GPU microbenchmarks (peak FLOPS)

By: Maynard Handley (name99.delete@this.name99.org), December 5, 2020 12:12 pm
Room: Moderated Discussions
K.K (anon.delete@this.anon.coom) on December 5, 2020 2:31 am wrote:
> I tried to benchmark the peak compute performance of the M1 GPU this morning (I have a MacBook
> Pro). Apple claims that the GPU is capable of 2.6 TFLOPS, yet clpeak only reports measured
> 1.1 TFLOPS, so I decided to do my own tests. Please keep in mind that I have no experience
> with these kind of benchmarks and that my approach might be extremely naive. I hope that you
> smart people can help me make more sense of the results, because I find them quite strange
> (apologies for the long post lack of spacing, the forum seems to gobble up indentation?)
>
> Basic methodology was to run a long chain of FMA operations on
> the GPU. I chose the simple geometric Taylor series expansion
>
>
> 1/(1-x) = sum x^k (k>=0)
>

>
> because it produces meaningful results and can be trivially computed as a chain of FMA operations like this
>
>
> sum_0 = x
> sum_k = fma(x, x, sum_{k-1})
>

>
> The metal shader was based around the following C++ template:
>
>
> template
> struct taylor_series_sum {
> static T compute(T x) {
> return fma(x, x, taylor_series_sum::compute(x));
> }
> };
>
> template
> struct taylor_series_sum {
> static T compute(T x) {
> return x;
> }
> };
>

>
>
> The compute kernel function itself is them fairly trivial. Memory reads/writes are avoided as
> much as possible, there is only a final store to prevent the kernel of being optimised out. Below
> an example of a kernel that runs 512 FMA operations (for 1024 total FLOP per invocation):
>
>
> kernel void benchmark(...) {
> out[0] = taylor_series( (float) index / (float) nthreads);
> }
>

>
>
> Now, here to the results.
>
> On my 16" MacBook Pro with an AMD 5500M I am getting an average of 3.6-3.8 TFLOPS with large
> enough kernel execution grid, which is in line with it's claimed peak of 4TFLOPS. On the
> M1 however, the same kernel only yields 1.2 - 1.3 TFLOPS, same as clpeak and half of what
> Apple is claiming. Note that M1 has 1024 GPU ALUs, so this is consistent with 1 FLOP per
> ALU per clock assuming the later is around 1.2ghz (which again is realistic).
>
> This is where things are getting interesting. I tried to run two interleaved FMA chains
> at the same time (each template invocation computes two Taylor series) and the performance
> jumped to 2.5-2.7 TFLOPS on M1 — no change on the AMD Navi GPU. Running three or more
> chains does not change the result (the throughput actually decreases slightly).
>
> I then tried to estimate the performance of half-precision computation. First the AMD Navi results which is
> known to have double FP16 rate. In my initial benchmark, FP16 performance was identical to the FP32 performance
> on the Navi. However, if I compute two or more interleaved expansions, the performance goes up to 6-7TFLOPS.
> This suggests that in order to benefit from the dual-rate FP16, two FP16 operations must be issued simultaneously.
> Now a big surprise: on M1, FP16 does not seem to have any impact on performance whatsoever. The results are
> identical to FP32. The calculation is definitely performed with lower precision (I verified the results),
> but the performance does not change. I find it a bit shocking, since common knowledge says that mobile GPUs
> heavily invest in fast reduced precision ALUs. I still need to test it on my iPhone. I suspect there might
> be a flaw in my methodology as I have difficulty believing that Apple doesn't have faster FP16. Maybe I should
> try something else, like computing a FP32 and FP16 chain simultaneously?
>
> To summarise:
>
> 1. Peak compute performance on M1 is only reached if multiple dependency chains are
> interleaved. I do not know whether this means that each ALU is capable of doing two
> FP32 operations per clock or whether there is some sort of pipeline effect.
>
> 2. M1 seems to run FP32 and FP16 operations at the same speed, at least
> in my simple example. I am very sceptical about this result.
>
> Do you have any suggestions how I can improve this test and what else can I do?
>
>

Since you are writing these sorts of tests anyway, it might be interesting to write a bunch of equivalent (large array/matrix) tests which you process via Apple's Accelerate calls.
https://developer.apple.com/documentation/accelerate

Most interestingly this might give us a number for Apple's AMX performance (which is supposedly now hooked into Accelerate as of iOS14/macOS M1), but might also route some of the calls (if that makes sense) to the GPU or NPU, also answering some part of your immediate question.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
M1 GPU microbenchmarks (peak FLOPS)K.K2020/12/05 03:31 AM
  M1 GPU microbenchmarks (peak FLOPS)K.K.2020/12/05 03:57 AM
  M1 GPU microbenchmarks (peak FLOPS)Adrian2020/12/05 04:00 AM
  M1 GPU microbenchmarks (peak FLOPS)Chester2020/12/05 10:39 AM
    M1 GPU microbenchmarks (peak FLOPS)K.K2020/12/06 03:46 AM
  M1 GPU microbenchmarks (peak FLOPS)Maynard Handley2020/12/05 12:12 PM
  M1 GPU has higher FP32 than A-series GPUsAndrei F2020/12/06 03:24 AM
    M1 GPU has higher FP32 than A-series GPUsK.K2020/12/06 03:49 AM
    M1 GPU has higher FP32 than A-series GPUsJeff S.2020/12/06 11:12 PM
      M1 GPU has higher FP32 than A-series GPUsK.K.2020/12/07 02:32 AM
        M1 GPU has higher FP32 than A-series GPUsJeff S.2020/12/07 10:29 PM
          M1 GPU has higher FP32 than A-series GPUsK.K.2020/12/09 01:35 AM
            M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 02:36 AM
              M1 GPU has higher FP32 than A-series GPUsK.K.2020/12/09 02:59 AM
                M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 03:16 AM
                  M1 GPU has higher FP32 than A-series GPUsK.K2020/12/09 04:56 AM
                    M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 05:28 AM
                      M1 GPU has higher FP32 than A-series GPUsK.K2020/12/09 06:01 AM
                        M1 GPU has higher FP32 than A-series GPUsAdrian2020/12/09 06:37 AM
                          and beyond just the flops...anonymou52020/12/09 02:08 PM
                            and beyond just the flops...Doug S2020/12/09 03:06 PM
                              and beyond just the flops...Maynard Handley2020/12/09 04:05 PM
                                and beyond just the flops...Adrian2020/12/09 05:36 PM
                                and beyond just the flops...anonymou52020/12/09 08:28 PM
                                  and beyond just the flops...Maynard Handley2020/12/09 08:40 PM
                                    and beyond just the flops...anonymou52020/12/10 02:18 AM
                                      and beyond just the flops...anonymou52020/12/10 02:51 AM
                                      and beyond just the flops...Maynard Handley2020/12/10 09:53 AM
                                        and beyond just the flops...Megol2020/12/10 10:57 AM
                                          and beyond just the flops...Maynard Handley2020/12/10 12:16 PM
                                            and beyond just the flops...anonymou52020/12/10 04:12 PM
                                              and beyond just the flops...anonymou52020/12/10 04:24 PM
                                                and beyond just the flops...Maynard Handley2020/12/10 06:18 PM
                                                  and beyond just the flops...Maynard Handley2020/12/10 06:23 PM
                                                    and beyond just the flops...anonymou52020/12/10 09:59 PM
                                                    and beyond just the flops...Gabriele Svelto2020/12/11 01:57 AM
                                                      and beyond just the flops...Dummond D. Slow2020/12/11 09:52 AM
                                                        and beyond just the flops...Maynard Handley2020/12/11 09:57 AM
                                                          and beyond just the flops...Dummond D. Slow2020/12/11 10:00 AM
                                                            and beyond just the flops...Maynard Handley2020/12/11 11:22 AM
                                                              and beyond just the flops...Dummond D. Slow2020/12/12 09:13 AM
                                                                and beyond just the flops...Maynard Handley2020/12/12 01:41 PM
                                                                  and beyond just the flops...Dummond D. Slow2020/12/12 03:35 PM
                                                  and beyond just the flops...Dummond D. Slow2020/12/11 09:41 AM
                                                    and beyond just the flops...Maynard Handley2020/12/11 10:01 AM
                                                      and beyond just the flops...Dummond D. Slow2020/12/11 10:23 AM
                                                        and beyond just the flops...Dummond D. Slow2020/12/11 10:24 AM
                                              and beyond just the flops...Adrian2020/12/11 01:37 AM
                                                and beyond just the flops...Adrian2020/12/11 02:59 AM
                                                and beyond just the flops...Ungo2020/12/11 03:39 AM
                                                and beyond just the flops...Maynard Handley2020/12/11 10:07 AM
                                                  and beyond just the flops...Adrian2020/12/11 11:39 AM
                                                    and beyond just the flops...anonymou52020/12/11 03:01 PM
                                                    and beyond just the flops...David Hess2020/12/12 08:17 AM
                                                      and beyond just the flops...Jukka Larja2020/12/12 11:08 AM
                                                        and beyond just the flops...David Hess2020/12/12 11:36 AM
                                                          and beyond just the flops...Jukka Larja2020/12/12 09:45 PM
                                                            and beyond just the flops...David Hess2020/12/13 06:59 PM
                                                        and beyond just the flops...Adrian2020/12/12 03:11 PM
                                                          and beyond just the flops...Adrian2020/12/12 03:21 PM
                                                            and beyond just the flops...David Hess2020/12/13 07:02 PM
                              and beyond just the flops...useruser2020/12/09 09:38 PM
                                and beyond just the flops...anonymou52020/12/10 02:31 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?