SP vs DP & performance metrics

Article: Computational Efficiency for CPUs and GPUs in 2012
By: anon (anon.delete@this.anon.com), July 29, 2012 1:53 am
Room: Moderated Discussions
Emil Briggs (me.delete@this.nowherespam.com) on July 28, 2012 6:11 pm wrote:
> aaron spink (aaronspink.delete@this.notearthlink.net) on July 28, 2012 7:05 am
> wrote:
> > Emil Briggs (me.delete@this.nowherespam.com) on July 28, 2012 6:40
> am
> > wrote:
> >
> > > By ORNL do you mean Oak Ridge National
> Laboratory? I ask
> > >
> > since I am a large user at ORNL. Currently
> only some of the Jaguar/Titan nodes
> >
> > > are equipped with GPU's
> but there are enough of them installed to do
> > realistic
> > >
> evaluations. For certain workloads (and when properly
> > programmed) they
> beat
> > > CPU's pretty handily performance wise. And that
> > data
> comes from a real world
> > > application not LINPACK. The section of the
>
> > code that I adapted for GPU's runs 3
> > > to 4 times faster than it
> does on
> > CPU's. It's also possible with this particular
> > >
> application to overlap some
> > operations on the CPU and GPU and hide the
> latency
> > > of PCI-E data
> > transfers. Obviously not all
> applications can benefit in the same
> > > way and
> > it's not easy to
> do so even when possible but GPU's can offer some very
> > >
> > nice
> performance gains in some cases.
> > >
> >
> > I'm not denying that
> there are
> > some workloads and some kernels that have an advantage, I am
> however saying that
> > it is generally the exception based on data published
> from both PRACE and ORNL,
> > et al.
> >
> > BTW, what % of peak are you
> seeing on the GPUs with your code?
> >
>
> There are two places where we are
> using GPU's. One of them consists of large matrix operations and are done using
> the Nvidia cublas library. Those run close to 80% of peak. The tricky part of
> the work here is keeping the CPU's busy doing something useful while moving the
> matrices back and forth to the GPU. The other place is some finite difference
> routines. Still working on this. It's faster than doing it all on the CPU's but
> not by much. I'm trying to get more overlap between the CPU and GPU's here but
> this section of the code is not as suitable for that as the first.
>
> >
> >
> > That being said I do think that the
> > > cost of moving data
> across the PCI-E
> > bus and the difficulty of the programming
> > >
> model are some real downsides
> > to GPU's. How all that plays out will be
>
> > > interesting and I'm looking
> > forward to getting my hands on
> some Intel MIC
> > > hardware to see what we can
> > do with it.
> >
> >
> >
>
> > eventually see x32/x40 PCI-E interfaces or dual QPI
> interfaces. Certainly would
> > be nice if the GPUs/MICs had simple coherent
> access to memory.
>
> Agreed. How difficult do you think it would be to implement
> coherent memory access?

Probably not all that much harder than anything else they have to do. But I would bet that neither Intel nor AMD would allow NVIDIA to implement it, so you will see it in AMD or Intel GPUs if ever.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: Compute Efficiency 2012David Kanter07/25/12 12:37 AM
  New Article: Compute Efficiency 2012SHK07/25/12 01:31 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 01:42 AM
  New Article: Compute Efficiency 2012none07/25/12 02:18 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 10:25 AM
  GCN (NT)EBFE07/25/12 02:25 AM
    GCN - TFLOP DPjp08/09/12 12:58 PM
      GCN - TFLOP DPDavid Kanter08/09/12 02:32 PM
        GCN - TFLOP DPKevin G08/11/12 04:22 PM
      GCN - TFLOP DPEric08/09/12 04:12 PM
        GCN - TFLOP DPjp08/10/12 12:23 AM
          GCN - TFLOP DPEBFE08/12/12 07:27 PM
            GCN - TFLOP DPjp08/13/12 01:02 AM
              GCN - TFLOP DPEBFE08/13/12 06:45 PM
                GCN - TFLOP DPjp08/14/12 12:21 AM
  New Article: Compute Efficiency 2012Adrian07/25/12 03:39 AM
    New Article: Compute Efficiency 2012EBFE07/25/12 08:33 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 10:11 AM
  New Article: Compute Efficiency 2012sf07/25/12 05:46 AM
    New Article: Compute Efficiency 2012aaron spink07/25/12 08:08 AM
      New Article: Compute Efficiency 2012someone07/25/12 09:06 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 10:14 AM
      New Article: Compute Efficiency 2012EBFE07/26/12 01:27 AM
        BG/QDavid Kanter07/26/12 08:31 AM
          VR-ZONE KNC B0 leak, poor number?EBFE08/03/12 12:57 AM
            VR-ZONE KNC B0 leak, poor number?Eric08/03/12 06:59 AM
              VR-ZONE KNC B0 leak, poor number?EBFE08/04/12 05:37 AM
                VR-ZONE KNC B0 leak, poor number?aaron spink08/04/12 05:51 PM
                Leaks != productsDavid Kanter08/05/12 02:19 AM
                  Leaks != productsEBFE08/06/12 01:49 AM
                VR-ZONE KNC B0 leak, poor number?Eric08/05/12 09:37 AM
                  VR-ZONE KNC B0 leak, poor number?EBFE08/06/12 02:09 AM
                    VR-ZONE KNC B0 leak, poor number?aaron spink08/06/12 03:33 AM
                      VR-ZONE KNC B0 leak, poor number?jp08/07/12 02:08 AM
                        VR-ZONE KNC B0 leak, poor number?Eric08/07/12 03:58 AM
                          VR-ZONE KNC B0 leak, poor number?jp08/07/12 04:17 AM
                            VR-ZONE KNC B0 leak, poor number?Eric08/07/12 04:22 AM
                              VR-ZONE KNC B0 leak, poor number?anonymou508/07/12 08:43 AM
                            VR-ZONE KNC B0 leak, poor number?jp08/07/12 04:23 AM
                              VR-ZONE KNC B0 leak, poor number?aaron spink08/07/12 06:24 AM
                        VR-ZONE KNC B0 leak, poor number?aaron spink08/07/12 06:20 AM
                          VR-ZONE KNC B0 leak, poor number?jp08/07/12 10:22 AM
                            VR-ZONE KNC B0 leak, poor number?EduardoS08/07/12 02:15 PM
                        KNC has FMADavid Kanter08/07/12 08:17 AM
  New Article: Compute Efficiency 2012forestlaughing07/25/12 07:51 AM
    New Article: Compute Efficiency 2012Eric07/27/12 04:12 AM
      New Article: Compute Efficiency 2012hobold07/27/12 10:53 AM
        New Article: Compute Efficiency 2012Eric07/27/12 11:51 AM
          New Article: Compute Efficiency 2012hobold07/27/12 01:48 PM
            New Article: Compute Efficiency 2012Eric07/27/12 02:29 PM
        New Article: Compute Efficiency 2012anon07/29/12 01:25 AM
          New Article: Compute Efficiency 2012hobold07/29/12 10:53 AM
  Efficiency? No, lack of highly useful featuressomeone07/25/12 08:58 AM
    Best case for GPUsDavid Kanter07/25/12 10:28 AM
      Best case for GPUsfranzliszt07/25/12 12:39 PM
      Best case for GPUsChuck07/25/12 07:13 PM
        Best case for GPUsDavid Kanter07/25/12 08:45 PM
        Best case for GPUsEric07/27/12 04:51 AM
  Silverthorn data pointMichael S07/25/12 01:45 PM
    Silverthorn data pointDavid Kanter07/25/12 03:06 PM
  New Article: Compute Efficiency 2012Unununium07/25/12 04:55 PM
    New Article: Compute Efficiency 2012EduardoS07/25/12 07:12 PM
      Ops... I'm wrong...EduardoS07/25/12 07:14 PM
  New Article: Compute Efficiency 2012TacoBell07/25/12 07:36 PM
    New Article: Compute Efficiency 2012David Kanter07/25/12 08:49 PM
    New Article: Compute Efficiency 2012Michael S07/26/12 01:33 AM
  Line and factorMoritz07/26/12 12:34 AM
    Line and factorPeter Boyle07/27/12 06:57 AM
      not entirelyMoritz07/27/12 11:22 AM
      Line and factorEduardoS07/27/12 04:24 PM
        Line and factorMoritz07/28/12 11:52 AM
  tables Michael S07/26/12 01:39 AM
  Interlagos L2+L3Rana07/26/12 02:13 AM
    Interlagos L2+L3Rana07/26/12 02:13 AM
    Interlagos L2+L3David Kanter07/26/12 08:21 AM
      SP vs DP & performance metricsjp07/27/12 06:08 AM
        SP vs DP & performance metricsEric07/27/12 06:57 AM
          SP vs DP & performance metricsjp07/27/12 08:18 AM
            SP vs DP & performance metricsaaron spink07/27/12 08:36 AM
              SP vs DP & performance metricsjp07/27/12 08:47 AM
                "Global" --> systemPaul A. Clayton07/27/12 09:31 AM
                  "Global" --> systemjp07/27/12 02:55 PM
                    "Global" --> systemaaron spink07/27/12 06:33 PM
                      "Global" --> systemjp07/28/12 01:00 AM
                        "Global" --> systemaaron spink07/28/12 05:54 AM
                          "Global" --> systemjp07/29/12 01:12 AM
                            "Global" --> systemaaron spink07/29/12 04:03 AM
                              "Global" --> systemnone07/29/12 08:05 AM
                                "Global" --> systemEduardoS07/29/12 09:26 AM
                                "Global" --> systemjp07/30/12 01:24 AM
                                  "Global" --> systemaaron spink07/30/12 02:05 AM
                                "Global" --> systemaaron spink07/30/12 02:03 AM
                                  daxpy is STREAM TRIADPaul A. Clayton07/30/12 05:10 AM
                SP vs DP & performance metricsaaron spink07/27/12 06:25 PM
                  SP vs DP & performance metricsEmil Briggs07/28/12 05:40 AM
                    SP vs DP & performance metricsaaron spink07/28/12 06:05 AM
                      SP vs DP & performance metricsjp07/28/12 10:04 AM
                        SP vs DP & performance metricsBrett07/28/12 02:32 PM
                      SP vs DP & performance metricsEmil Briggs07/28/12 05:11 PM
                        SP vs DP & performance metricsanon07/29/12 01:53 AM
                        SP vs DP & performance metricsaaron spink07/29/12 04:39 AM
                          Coherency for discretesRohit07/29/12 08:24 AM
                          SP vs DP & performance metricsanon07/29/12 10:09 AM
                          SP vs DP & performance metricsEric07/29/12 12:08 PM
        SP vs DP & performance metricsaaron spink07/27/12 08:25 AM
  Regular updates?Joe07/27/12 08:35 AM
  New Article: Compute Efficiency 201230907/27/12 09:34 PM
  New Article: Compute Efficiency 2012Ingeneer07/30/12 08:01 AM
    New Article: Compute Efficiency 2012David Kanter07/30/12 12:11 PM
      New Article: Compute Efficiency 2012Ingeneer07/30/12 07:04 PM
        New Article: Compute Efficiency 2012David Kanter07/30/12 08:32 PM
          Memory power and bandwidth?Iain McClatchie08/03/12 03:35 PM
            Memory power and bandwidth?David Kanter08/04/12 10:22 AM
              Memory power and bandwidth?Michael S08/04/12 01:36 PM
              Memory power and bandwidth?Iain McClatchie08/06/12 01:09 PM
              Memory power and bandwidth?Eric08/07/12 05:28 PM
                WorkloadsDavid Kanter08/08/12 09:49 AM
                  WorkloadsEric08/09/12 04:21 PM
                Latency and bandwidth bottlenecks Paul A. Clayton08/08/12 03:02 PM
                  Latency and bandwidth bottlenecks Eric08/09/12 04:32 PM
                    Latency and bandwidth bottlenecks none08/10/12 05:06 AM
                  Latency and bandwidth bottlenecks -> BDPajensen08/11/12 02:21 PM
            Memory power and bandwidth?Ingeneer08/06/12 10:26 AM
  NV aims for 1.8+ TFLOPS DP ?jp08/11/12 12:21 PM
    NV aims for 1.8+ TFLOPS DP ?David Kanter08/11/12 08:25 PM
      NV aims for 1.8+ TFLOPS DP ?jp08/12/12 01:45 AM
      NV aims for 1.8+ TFLOPS DP ?EBFE08/12/12 09:02 PM
        NV aims for 1.8+ TFLOPS DP ?jp08/13/12 12:54 AM
          NV aims for 1.8+ TFLOPS DP ?Gabriele Svelto08/13/12 08:16 AM
            NV aims for 1.8+ TFLOPS DP ?Vincent Diepeveen08/14/12 02:04 AM
          NV aims for 1.8+ TFLOPS DP ?David Kanter08/13/12 08:50 AM
            NV aims for 1.8+ TFLOPS DP ?jp08/13/12 10:17 AM
        NV aims for 1.8+ TFLOPS DP ?EduardoS08/13/12 05:45 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?