SP vs DP & performance metrics

Article: Computational Efficiency for CPUs and GPUs in 2012
By: jp (jipe4153.delete@this.gmail.com), July 28, 2012 11:04 am
Room: Moderated Discussions
aaron spink (aaronspink.delete@this.notearthlink.net) on July 28, 2012 7:05 am wrote:
> Emil Briggs (me.delete@this.nowherespam.com) on July 28, 2012 6:40 am
> wrote:
>
> > By ORNL do you mean Oak Ridge National Laboratory? I ask
> >
> since I am a large user at ORNL. Currently only some of the Jaguar/Titan nodes
>
> > are equipped with GPU's but there are enough of them installed to do
> realistic
> > evaluations. For certain workloads (and when properly
> programmed) they beat
> > CPU's pretty handily performance wise. And that
> data comes from a real world
> > application not LINPACK. The section of the
> code that I adapted for GPU's runs 3
> > to 4 times faster than it does on
> CPU's. It's also possible with this particular
> > application to overlap some
> operations on the CPU and GPU and hide the latency
> > of PCI-E data
> transfers. Obviously not all applications can benefit in the same
> > way and
> it's not easy to do so even when possible but GPU's can offer some very
> >
> nice performance gains in some cases.
> >
>
> I'm not denying that there are
> some workloads and some kernels that have an advantage, I am however saying that
> it is generally the exception based on data published from both PRACE and ORNL,
> et al.
>
> BTW, what % of peak are you seeing on the GPUs with your code?
>
> >
> That being said I do think that the
> > cost of moving data across the PCI-E
> bus and the difficulty of the programming
> > model are some real downsides
> to GPU's. How all that plays out will be
> > interesting and I'm looking
> forward to getting my hands on some Intel MIC
> > hardware to see what we can
> do with it.
> >
>
> MIC still has the disadvantage of PCI-E being a limiter but
> it should have a better programming model overall than GPUs. Hopefully we'll
> eventually see x32/x40 PCI-E interfaces or dual QPI interfaces. Certainly would
> be nice if the GPUs/MICs had simple coherent access to memory.

Nvidia GPUs do have simple coherent access to memory. For example data allocated on the CPU (host side) can be seamlessly pulled over to the GPU kernel ( just by passing a pointer). It's more a question of PCIe speed than ease of use.

But on a majority of the real world applications I've worked on the computational time exceeds the copy_to and copy_from time meaning that you can hide almost all of the transfer time by overlapping compute & transfer.

So to my experience the PCIe has not been a real limiter.... And according to bias little me the programming model isn't very difficult compared to trying to write a highly optimized multi threaded CPU application utilizing SSE. It's just that CUDA/OpenCL doesn't allow you to write stupid code to begin with, you get punished immediately for that.

In the future we'll have ARM+GPU (ie Maxwell) on the same die which means the discrete graphics cards would be able to also handle the strictly serial components of the algorithms, the big gain here being that a closely coupled GPU+CPU (a bit like fusion) via an interconnect or L3 cache would mean that even smaller workloads could be offloaded. This is border line speculation but it does seem to be on the horizon.

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: Compute Efficiency 2012David Kanter07/25/12 01:37 AM
  New Article: Compute Efficiency 2012SHK07/25/12 02:31 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 02:42 AM
  New Article: Compute Efficiency 2012none07/25/12 03:18 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 11:25 AM
  GCN (NT)EBFE07/25/12 03:25 AM
    GCN - TFLOP DPjp08/09/12 01:58 PM
      GCN - TFLOP DPDavid Kanter08/09/12 03:32 PM
        GCN - TFLOP DPKevin G08/11/12 05:22 PM
      GCN - TFLOP DPEric08/09/12 05:12 PM
        GCN - TFLOP DPjp08/10/12 01:23 AM
          GCN - TFLOP DPEBFE08/12/12 08:27 PM
            GCN - TFLOP DPjp08/13/12 02:02 AM
              GCN - TFLOP DPEBFE08/13/12 07:45 PM
                GCN - TFLOP DPjp08/14/12 01:21 AM
  New Article: Compute Efficiency 2012Adrian07/25/12 04:39 AM
    New Article: Compute Efficiency 2012EBFE07/25/12 09:33 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 11:11 AM
  New Article: Compute Efficiency 2012sf07/25/12 06:46 AM
    New Article: Compute Efficiency 2012aaron spink07/25/12 09:08 AM
      New Article: Compute Efficiency 2012someone07/25/12 10:06 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 11:14 AM
      New Article: Compute Efficiency 2012EBFE07/26/12 02:27 AM
        BG/QDavid Kanter07/26/12 09:31 AM
          VR-ZONE KNC B0 leak, poor number?EBFE08/03/12 01:57 AM
            VR-ZONE KNC B0 leak, poor number?Eric08/03/12 07:59 AM
              VR-ZONE KNC B0 leak, poor number?EBFE08/04/12 06:37 AM
                VR-ZONE KNC B0 leak, poor number?aaron spink08/04/12 06:51 PM
                Leaks != productsDavid Kanter08/05/12 03:19 AM
                  Leaks != productsEBFE08/06/12 02:49 AM
                VR-ZONE KNC B0 leak, poor number?Eric08/05/12 10:37 AM
                  VR-ZONE KNC B0 leak, poor number?EBFE08/06/12 03:09 AM
                    VR-ZONE KNC B0 leak, poor number?aaron spink08/06/12 04:33 AM
                      VR-ZONE KNC B0 leak, poor number?jp08/07/12 03:08 AM
                        VR-ZONE KNC B0 leak, poor number?Eric08/07/12 04:58 AM
                          VR-ZONE KNC B0 leak, poor number?jp08/07/12 05:17 AM
                            VR-ZONE KNC B0 leak, poor number?Eric08/07/12 05:22 AM
                              VR-ZONE KNC B0 leak, poor number?anonymou508/07/12 09:43 AM
                            VR-ZONE KNC B0 leak, poor number?jp08/07/12 05:23 AM
                              VR-ZONE KNC B0 leak, poor number?aaron spink08/07/12 07:24 AM
                        VR-ZONE KNC B0 leak, poor number?aaron spink08/07/12 07:20 AM
                          VR-ZONE KNC B0 leak, poor number?jp08/07/12 11:22 AM
                            VR-ZONE KNC B0 leak, poor number?EduardoS08/07/12 03:15 PM
                        KNC has FMADavid Kanter08/07/12 09:17 AM
  New Article: Compute Efficiency 2012forestlaughing07/25/12 08:51 AM
    New Article: Compute Efficiency 2012Eric07/27/12 05:12 AM
      New Article: Compute Efficiency 2012hobold07/27/12 11:53 AM
        New Article: Compute Efficiency 2012Eric07/27/12 12:51 PM
          New Article: Compute Efficiency 2012hobold07/27/12 02:48 PM
            New Article: Compute Efficiency 2012Eric07/27/12 03:29 PM
        New Article: Compute Efficiency 2012anon07/29/12 02:25 AM
          New Article: Compute Efficiency 2012hobold07/29/12 11:53 AM
  Efficiency? No, lack of highly useful featuressomeone07/25/12 09:58 AM
    Best case for GPUsDavid Kanter07/25/12 11:28 AM
      Best case for GPUsfranzliszt07/25/12 01:39 PM
      Best case for GPUsChuck07/25/12 08:13 PM
        Best case for GPUsDavid Kanter07/25/12 09:45 PM
        Best case for GPUsEric07/27/12 05:51 AM
  Silverthorn data pointMichael S07/25/12 02:45 PM
    Silverthorn data pointDavid Kanter07/25/12 04:06 PM
  New Article: Compute Efficiency 2012Unununium07/25/12 05:55 PM
    New Article: Compute Efficiency 2012EduardoS07/25/12 08:12 PM
      Ops... I'm wrong...EduardoS07/25/12 08:14 PM
  New Article: Compute Efficiency 2012TacoBell07/25/12 08:36 PM
    New Article: Compute Efficiency 2012David Kanter07/25/12 09:49 PM
    New Article: Compute Efficiency 2012Michael S07/26/12 02:33 AM
  Line and factorMoritz07/26/12 01:34 AM
    Line and factorPeter Boyle07/27/12 07:57 AM
      not entirelyMoritz07/27/12 12:22 PM
      Line and factorEduardoS07/27/12 05:24 PM
        Line and factorMoritz07/28/12 12:52 PM
  tables Michael S07/26/12 02:39 AM
  Interlagos L2+L3Rana07/26/12 03:13 AM
    Interlagos L2+L3Rana07/26/12 03:13 AM
    Interlagos L2+L3David Kanter07/26/12 09:21 AM
      SP vs DP & performance metricsjp07/27/12 07:08 AM
        SP vs DP & performance metricsEric07/27/12 07:57 AM
          SP vs DP & performance metricsjp07/27/12 09:18 AM
            SP vs DP & performance metricsaaron spink07/27/12 09:36 AM
              SP vs DP & performance metricsjp07/27/12 09:47 AM
                "Global" --> systemPaul A. Clayton07/27/12 10:31 AM
                  "Global" --> systemjp07/27/12 03:55 PM
                    "Global" --> systemaaron spink07/27/12 07:33 PM
                      "Global" --> systemjp07/28/12 02:00 AM
                        "Global" --> systemaaron spink07/28/12 06:54 AM
                          "Global" --> systemjp07/29/12 02:12 AM
                            "Global" --> systemaaron spink07/29/12 05:03 AM
                              "Global" --> systemnone07/29/12 09:05 AM
                                "Global" --> systemEduardoS07/29/12 10:26 AM
                                "Global" --> systemjp07/30/12 02:24 AM
                                  "Global" --> systemaaron spink07/30/12 03:05 AM
                                "Global" --> systemaaron spink07/30/12 03:03 AM
                                  daxpy is STREAM TRIADPaul A. Clayton07/30/12 06:10 AM
                SP vs DP & performance metricsaaron spink07/27/12 07:25 PM
                  SP vs DP & performance metricsEmil Briggs07/28/12 06:40 AM
                    SP vs DP & performance metricsaaron spink07/28/12 07:05 AM
                      SP vs DP & performance metricsjp07/28/12 11:04 AM
                        SP vs DP & performance metricsBrett07/28/12 03:32 PM
                      SP vs DP & performance metricsEmil Briggs07/28/12 06:11 PM
                        SP vs DP & performance metricsanon07/29/12 02:53 AM
                        SP vs DP & performance metricsaaron spink07/29/12 05:39 AM
                          Coherency for discretesRohit07/29/12 09:24 AM
                          SP vs DP & performance metricsanon07/29/12 11:09 AM
                          SP vs DP & performance metricsEric07/29/12 01:08 PM
        SP vs DP & performance metricsaaron spink07/27/12 09:25 AM
  Regular updates?Joe07/27/12 09:35 AM
  New Article: Compute Efficiency 201230907/27/12 10:34 PM
  New Article: Compute Efficiency 2012Ingeneer07/30/12 09:01 AM
    New Article: Compute Efficiency 2012David Kanter07/30/12 01:11 PM
      New Article: Compute Efficiency 2012Ingeneer07/30/12 08:04 PM
        New Article: Compute Efficiency 2012David Kanter07/30/12 09:32 PM
          Memory power and bandwidth?Iain McClatchie08/03/12 04:35 PM
            Memory power and bandwidth?David Kanter08/04/12 11:22 AM
              Memory power and bandwidth?Michael S08/04/12 02:36 PM
              Memory power and bandwidth?Iain McClatchie08/06/12 02:09 PM
              Memory power and bandwidth?Eric08/07/12 06:28 PM
                WorkloadsDavid Kanter08/08/12 10:49 AM
                  WorkloadsEric08/09/12 05:21 PM
                Latency and bandwidth bottlenecks Paul A. Clayton08/08/12 04:02 PM
                  Latency and bandwidth bottlenecks Eric08/09/12 05:32 PM
                    Latency and bandwidth bottlenecks none08/10/12 06:06 AM
                  Latency and bandwidth bottlenecks -> BDPajensen08/11/12 03:21 PM
            Memory power and bandwidth?Ingeneer08/06/12 11:26 AM
  NV aims for 1.8+ TFLOPS DP ?jp08/11/12 01:21 PM
    NV aims for 1.8+ TFLOPS DP ?David Kanter08/11/12 09:25 PM
      NV aims for 1.8+ TFLOPS DP ?jp08/12/12 02:45 AM
      NV aims for 1.8+ TFLOPS DP ?EBFE08/12/12 10:02 PM
        NV aims for 1.8+ TFLOPS DP ?jp08/13/12 01:54 AM
          NV aims for 1.8+ TFLOPS DP ?Gabriele Svelto08/13/12 09:16 AM
            NV aims for 1.8+ TFLOPS DP ?Vincent Diepeveen08/14/12 03:04 AM
          NV aims for 1.8+ TFLOPS DP ?David Kanter08/13/12 09:50 AM
            NV aims for 1.8+ TFLOPS DP ?jp08/13/12 11:17 AM
        NV aims for 1.8+ TFLOPS DP ?EduardoS08/13/12 06:45 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?