Best case for GPUs

Article: Computational Efficiency for CPUs and GPUs in 2012
By: David Kanter (dkanter.delete@this.realworldtech.com), July 25, 2012 11:28 am
Room: Moderated Discussions
someone (someone.delete@this.somewhere.com) on July 25, 2012 9:58 am wrote:
> David Kanter (dkanter.delete@this.realworldtech.com) on July 25, 2012 1:37 am
> wrote:
> > New computational efficiency data shows GPUs with a clear edge over
> CPUs, but
> > the gap is narrowing as CPUs adopt wide vectors (e.g. AVX).
> Surprisingly, a
> > throughput CPU is the most energy efficient processor,
> offering hope for future
> > architectures. Our data also shows some
> advantages of AMD's Bulldozer, and the
> > overhead associated with highly
> scalable server CPUs.
> >
> > Comments and feedback
> > welcome!
> >
>
> > David
>
> Calling FLOPS/W and FLOPs/mm2 efficiency is highly misleading
> because it has no
> concept of effective FLOPs while doing something useful. The
> FP functional units
> of a general purpose MPU is a tiny fraction of device area
> and power budget. Why?

That's right, no cache, no branch prediction, no bypassing, no store forwarding, etc.

There's a reason why I focus on compute efficiency, as opposed to performance efficiency. Compute != performance.

> Everything else there SUPPORTS feeding those units
> over a huge spectrum of usage
> in terms of data access and control complexity
> without demanding quite unreasonable
> effort and methods for programming. GPUs
> have less silicon overhead per FP unit
> because they are very less generally
> useful. For HPC algorithms with complex data
> access and control paths GPUs are
> hugely inefficient and can only approach a tiny
> fraction of their theoretical
> peak FLOPs. Why hasn't anyone run published a SPECfp
> score running entirely on
> a GPU yet? :-D

I totally agree. The easiest way to see that is to compare the cache for say, IVB (2MB/core) vs. Fermi (guessing ~64KB/core).

> In a modern process I could tile a 200 mm2 die with nothing
> but FMACs and clock
> and power distribution and blow away everything on your
> graph but it would not be
> capable of anything useful. But hey, what
> "efficiency" woot!

I agree (see Tilera!). However, if you want to talk about realizable FP performance, now you need to pick a workload.

What workload should we use?

What meaningful workload has been run (and reported) on all of those systems?

The closest is Linpack, but that hasn't been run on the T4 for obvious reasons.

No, what this chart measures is the *BEST CASE* for a GPU (i.e. something akin to Linpack). Any real workload will change the positions substantially and more complex ones will show that GPUs are less efficient.

David
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: Compute Efficiency 2012David Kanter07/25/12 01:37 AM
  New Article: Compute Efficiency 2012SHK07/25/12 02:31 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 02:42 AM
  New Article: Compute Efficiency 2012none07/25/12 03:18 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 11:25 AM
  GCN (NT)EBFE07/25/12 03:25 AM
    GCN - TFLOP DPjp08/09/12 01:58 PM
      GCN - TFLOP DPDavid Kanter08/09/12 03:32 PM
        GCN - TFLOP DPKevin G08/11/12 05:22 PM
      GCN - TFLOP DPEric08/09/12 05:12 PM
        GCN - TFLOP DPjp08/10/12 01:23 AM
          GCN - TFLOP DPEBFE08/12/12 08:27 PM
            GCN - TFLOP DPjp08/13/12 02:02 AM
              GCN - TFLOP DPEBFE08/13/12 07:45 PM
                GCN - TFLOP DPjp08/14/12 01:21 AM
  New Article: Compute Efficiency 2012Adrian07/25/12 04:39 AM
    New Article: Compute Efficiency 2012EBFE07/25/12 09:33 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 11:11 AM
  New Article: Compute Efficiency 2012sf07/25/12 06:46 AM
    New Article: Compute Efficiency 2012aaron spink07/25/12 09:08 AM
      New Article: Compute Efficiency 2012someone07/25/12 10:06 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 11:14 AM
      New Article: Compute Efficiency 2012EBFE07/26/12 02:27 AM
        BG/QDavid Kanter07/26/12 09:31 AM
          VR-ZONE KNC B0 leak, poor number?EBFE08/03/12 01:57 AM
            VR-ZONE KNC B0 leak, poor number?Eric08/03/12 07:59 AM
              VR-ZONE KNC B0 leak, poor number?EBFE08/04/12 06:37 AM
                VR-ZONE KNC B0 leak, poor number?aaron spink08/04/12 06:51 PM
                Leaks != productsDavid Kanter08/05/12 03:19 AM
                  Leaks != productsEBFE08/06/12 02:49 AM
                VR-ZONE KNC B0 leak, poor number?Eric08/05/12 10:37 AM
                  VR-ZONE KNC B0 leak, poor number?EBFE08/06/12 03:09 AM
                    VR-ZONE KNC B0 leak, poor number?aaron spink08/06/12 04:33 AM
                      VR-ZONE KNC B0 leak, poor number?jp08/07/12 03:08 AM
                        VR-ZONE KNC B0 leak, poor number?Eric08/07/12 04:58 AM
                          VR-ZONE KNC B0 leak, poor number?jp08/07/12 05:17 AM
                            VR-ZONE KNC B0 leak, poor number?Eric08/07/12 05:22 AM
                              VR-ZONE KNC B0 leak, poor number?anonymou508/07/12 09:43 AM
                            VR-ZONE KNC B0 leak, poor number?jp08/07/12 05:23 AM
                              VR-ZONE KNC B0 leak, poor number?aaron spink08/07/12 07:24 AM
                        VR-ZONE KNC B0 leak, poor number?aaron spink08/07/12 07:20 AM
                          VR-ZONE KNC B0 leak, poor number?jp08/07/12 11:22 AM
                            VR-ZONE KNC B0 leak, poor number?EduardoS08/07/12 03:15 PM
                        KNC has FMADavid Kanter08/07/12 09:17 AM
  New Article: Compute Efficiency 2012forestlaughing07/25/12 08:51 AM
    New Article: Compute Efficiency 2012Eric07/27/12 05:12 AM
      New Article: Compute Efficiency 2012hobold07/27/12 11:53 AM
        New Article: Compute Efficiency 2012Eric07/27/12 12:51 PM
          New Article: Compute Efficiency 2012hobold07/27/12 02:48 PM
            New Article: Compute Efficiency 2012Eric07/27/12 03:29 PM
        New Article: Compute Efficiency 2012anon07/29/12 02:25 AM
          New Article: Compute Efficiency 2012hobold07/29/12 11:53 AM
  Efficiency? No, lack of highly useful featuressomeone07/25/12 09:58 AM
    Best case for GPUsDavid Kanter07/25/12 11:28 AM
      Best case for GPUsfranzliszt07/25/12 01:39 PM
      Best case for GPUsChuck07/25/12 08:13 PM
        Best case for GPUsDavid Kanter07/25/12 09:45 PM
        Best case for GPUsEric07/27/12 05:51 AM
  Silverthorn data pointMichael S07/25/12 02:45 PM
    Silverthorn data pointDavid Kanter07/25/12 04:06 PM
  New Article: Compute Efficiency 2012Unununium07/25/12 05:55 PM
    New Article: Compute Efficiency 2012EduardoS07/25/12 08:12 PM
      Ops... I'm wrong...EduardoS07/25/12 08:14 PM
  New Article: Compute Efficiency 2012TacoBell07/25/12 08:36 PM
    New Article: Compute Efficiency 2012David Kanter07/25/12 09:49 PM
    New Article: Compute Efficiency 2012Michael S07/26/12 02:33 AM
  Line and factorMoritz07/26/12 01:34 AM
    Line and factorPeter Boyle07/27/12 07:57 AM
      not entirelyMoritz07/27/12 12:22 PM
      Line and factorEduardoS07/27/12 05:24 PM
        Line and factorMoritz07/28/12 12:52 PM
  tables Michael S07/26/12 02:39 AM
  Interlagos L2+L3Rana07/26/12 03:13 AM
    Interlagos L2+L3Rana07/26/12 03:13 AM
    Interlagos L2+L3David Kanter07/26/12 09:21 AM
      SP vs DP & performance metricsjp07/27/12 07:08 AM
        SP vs DP & performance metricsEric07/27/12 07:57 AM
          SP vs DP & performance metricsjp07/27/12 09:18 AM
            SP vs DP & performance metricsaaron spink07/27/12 09:36 AM
              SP vs DP & performance metricsjp07/27/12 09:47 AM
                "Global" --> systemPaul A. Clayton07/27/12 10:31 AM
                  "Global" --> systemjp07/27/12 03:55 PM
                    "Global" --> systemaaron spink07/27/12 07:33 PM
                      "Global" --> systemjp07/28/12 02:00 AM
                        "Global" --> systemaaron spink07/28/12 06:54 AM
                          "Global" --> systemjp07/29/12 02:12 AM
                            "Global" --> systemaaron spink07/29/12 05:03 AM
                              "Global" --> systemnone07/29/12 09:05 AM
                                "Global" --> systemEduardoS07/29/12 10:26 AM
                                "Global" --> systemjp07/30/12 02:24 AM
                                  "Global" --> systemaaron spink07/30/12 03:05 AM
                                "Global" --> systemaaron spink07/30/12 03:03 AM
                                  daxpy is STREAM TRIADPaul A. Clayton07/30/12 06:10 AM
                SP vs DP & performance metricsaaron spink07/27/12 07:25 PM
                  SP vs DP & performance metricsEmil Briggs07/28/12 06:40 AM
                    SP vs DP & performance metricsaaron spink07/28/12 07:05 AM
                      SP vs DP & performance metricsjp07/28/12 11:04 AM
                        SP vs DP & performance metricsBrett07/28/12 03:32 PM
                      SP vs DP & performance metricsEmil Briggs07/28/12 06:11 PM
                        SP vs DP & performance metricsanon07/29/12 02:53 AM
                        SP vs DP & performance metricsaaron spink07/29/12 05:39 AM
                          Coherency for discretesRohit07/29/12 09:24 AM
                          SP vs DP & performance metricsanon07/29/12 11:09 AM
                          SP vs DP & performance metricsEric07/29/12 01:08 PM
        SP vs DP & performance metricsaaron spink07/27/12 09:25 AM
  Regular updates?Joe07/27/12 09:35 AM
  New Article: Compute Efficiency 201230907/27/12 10:34 PM
  New Article: Compute Efficiency 2012Ingeneer07/30/12 09:01 AM
    New Article: Compute Efficiency 2012David Kanter07/30/12 01:11 PM
      New Article: Compute Efficiency 2012Ingeneer07/30/12 08:04 PM
        New Article: Compute Efficiency 2012David Kanter07/30/12 09:32 PM
          Memory power and bandwidth?Iain McClatchie08/03/12 04:35 PM
            Memory power and bandwidth?David Kanter08/04/12 11:22 AM
              Memory power and bandwidth?Michael S08/04/12 02:36 PM
              Memory power and bandwidth?Iain McClatchie08/06/12 02:09 PM
              Memory power and bandwidth?Eric08/07/12 06:28 PM
                WorkloadsDavid Kanter08/08/12 10:49 AM
                  WorkloadsEric08/09/12 05:21 PM
                Latency and bandwidth bottlenecks Paul A. Clayton08/08/12 04:02 PM
                  Latency and bandwidth bottlenecks Eric08/09/12 05:32 PM
                    Latency and bandwidth bottlenecks none08/10/12 06:06 AM
                  Latency and bandwidth bottlenecks -> BDPajensen08/11/12 03:21 PM
            Memory power and bandwidth?Ingeneer08/06/12 11:26 AM
  NV aims for 1.8+ TFLOPS DP ?jp08/11/12 01:21 PM
    NV aims for 1.8+ TFLOPS DP ?David Kanter08/11/12 09:25 PM
      NV aims for 1.8+ TFLOPS DP ?jp08/12/12 02:45 AM
      NV aims for 1.8+ TFLOPS DP ?EBFE08/12/12 10:02 PM
        NV aims for 1.8+ TFLOPS DP ?jp08/13/12 01:54 AM
          NV aims for 1.8+ TFLOPS DP ?Gabriele Svelto08/13/12 09:16 AM
            NV aims for 1.8+ TFLOPS DP ?Vincent Diepeveen08/14/12 03:04 AM
          NV aims for 1.8+ TFLOPS DP ?David Kanter08/13/12 09:50 AM
            NV aims for 1.8+ TFLOPS DP ?jp08/13/12 11:17 AM
        NV aims for 1.8+ TFLOPS DP ?EduardoS08/13/12 06:45 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?