SP vs DP & performance metrics

Article: Computational Efficiency for CPUs and GPUs in 2012
By: jp (jipe4153.delete@this.gmail.com), July 27, 2012 9:18 am
Room: Moderated Discussions
Eric (eric.kjellen.delete@this.gmail.com) on July 27, 2012 7:57 am wrote:
> jp (jipe4153.delete@this.gmail.com) on July 27, 2012 7:08 am wrote:
> > The
> idea that it's easier to
> > squeeze more theoretical performance with
> multi-threading and SSE instructions
> > on a CPU is unfortunately not true.
>
> >
>
> I agree, but the question is if the superior programmability and
> better performance in some branching and data irregular workloads (and maybe
> most importantly, more consistent performance across many workloads), together
> with Intel's growing process technology advantage and maybe also advanced
> packaging (such as TSV stacking of DRAM to enable very high memory bandwidth for
> SIMD applications) will turn out to be the killer advantages of the CPU.
>

The point is that not all but most workloads do have enough fine grained parallelism to exploit SIMD capabilities.

About bandwidth, the GPUs already have the fastest RAM out there ( over 250 GB/s ) and they have no reason not to continue this lead (read mentioned FLOPS/bandwidth ratio "issue")

> My
> answer is yes, not least because of the many historical precedents where an
> opportunity to eliminate a co-processor at the expense of raw performance and
> complexity, and to the benefit of consistency and programmability/flexibility,
> has been pursued with a great deal of enthusiasm by the industry. Let's not
> forget that the origins of the GPUs lay in their ability to accelerate 3D
> graphics when the CPU was no longer enough. They are not a natural, inescapable
> feature of general-purpose computing.
>

You just agreed that it was not easier to develop a high performance solution for a CPU. Dont you think GPU vendors will continue to churn out more features and high level abstrations to simplify development? The answer is yes, example Nvidias OpenACC and fortrans pgi compiler for CUDA.

> And for large-scale HPC deployments, we
> have already seen that throughput-optimized CPUs are (at least for the moment)
> preferred by the most demanding customers and that they can deliver
> significantly higher efficiency than CPU + GPGPU systems even with regard to
> peak theoretical performance, not to speak of the performance that will
> realistically actually be achieved.

Looking at the articles at hpcwire about new clusters over the last 1.5 years its obvious that almost everyone is buying Nvidia Tesla cards. In fact ORNL:s newest cluster (fastest in the US) will be based on the new Kepler cards (K20).

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: Compute Efficiency 2012David Kanter07/25/12 01:37 AM
  New Article: Compute Efficiency 2012SHK07/25/12 02:31 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 02:42 AM
  New Article: Compute Efficiency 2012none07/25/12 03:18 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 11:25 AM
  GCN (NT)EBFE07/25/12 03:25 AM
    GCN - TFLOP DPjp08/09/12 01:58 PM
      GCN - TFLOP DPDavid Kanter08/09/12 03:32 PM
        GCN - TFLOP DPKevin G08/11/12 05:22 PM
      GCN - TFLOP DPEric08/09/12 05:12 PM
        GCN - TFLOP DPjp08/10/12 01:23 AM
          GCN - TFLOP DPEBFE08/12/12 08:27 PM
            GCN - TFLOP DPjp08/13/12 02:02 AM
              GCN - TFLOP DPEBFE08/13/12 07:45 PM
                GCN - TFLOP DPjp08/14/12 01:21 AM
  New Article: Compute Efficiency 2012Adrian07/25/12 04:39 AM
    New Article: Compute Efficiency 2012EBFE07/25/12 09:33 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 11:11 AM
  New Article: Compute Efficiency 2012sf07/25/12 06:46 AM
    New Article: Compute Efficiency 2012aaron spink07/25/12 09:08 AM
      New Article: Compute Efficiency 2012someone07/25/12 10:06 AM
    New Article: Compute Efficiency 2012David Kanter07/25/12 11:14 AM
      New Article: Compute Efficiency 2012EBFE07/26/12 02:27 AM
        BG/QDavid Kanter07/26/12 09:31 AM
          VR-ZONE KNC B0 leak, poor number?EBFE08/03/12 01:57 AM
            VR-ZONE KNC B0 leak, poor number?Eric08/03/12 07:59 AM
              VR-ZONE KNC B0 leak, poor number?EBFE08/04/12 06:37 AM
                VR-ZONE KNC B0 leak, poor number?aaron spink08/04/12 06:51 PM
                Leaks != productsDavid Kanter08/05/12 03:19 AM
                  Leaks != productsEBFE08/06/12 02:49 AM
                VR-ZONE KNC B0 leak, poor number?Eric08/05/12 10:37 AM
                  VR-ZONE KNC B0 leak, poor number?EBFE08/06/12 03:09 AM
                    VR-ZONE KNC B0 leak, poor number?aaron spink08/06/12 04:33 AM
                      VR-ZONE KNC B0 leak, poor number?jp08/07/12 03:08 AM
                        VR-ZONE KNC B0 leak, poor number?Eric08/07/12 04:58 AM
                          VR-ZONE KNC B0 leak, poor number?jp08/07/12 05:17 AM
                            VR-ZONE KNC B0 leak, poor number?Eric08/07/12 05:22 AM
                              VR-ZONE KNC B0 leak, poor number?anonymou508/07/12 09:43 AM
                            VR-ZONE KNC B0 leak, poor number?jp08/07/12 05:23 AM
                              VR-ZONE KNC B0 leak, poor number?aaron spink08/07/12 07:24 AM
                        VR-ZONE KNC B0 leak, poor number?aaron spink08/07/12 07:20 AM
                          VR-ZONE KNC B0 leak, poor number?jp08/07/12 11:22 AM
                            VR-ZONE KNC B0 leak, poor number?EduardoS08/07/12 03:15 PM
                        KNC has FMADavid Kanter08/07/12 09:17 AM
  New Article: Compute Efficiency 2012forestlaughing07/25/12 08:51 AM
    New Article: Compute Efficiency 2012Eric07/27/12 05:12 AM
      New Article: Compute Efficiency 2012hobold07/27/12 11:53 AM
        New Article: Compute Efficiency 2012Eric07/27/12 12:51 PM
          New Article: Compute Efficiency 2012hobold07/27/12 02:48 PM
            New Article: Compute Efficiency 2012Eric07/27/12 03:29 PM
        New Article: Compute Efficiency 2012anon07/29/12 02:25 AM
          New Article: Compute Efficiency 2012hobold07/29/12 11:53 AM
  Efficiency? No, lack of highly useful featuressomeone07/25/12 09:58 AM
    Best case for GPUsDavid Kanter07/25/12 11:28 AM
      Best case for GPUsfranzliszt07/25/12 01:39 PM
      Best case for GPUsChuck07/25/12 08:13 PM
        Best case for GPUsDavid Kanter07/25/12 09:45 PM
        Best case for GPUsEric07/27/12 05:51 AM
  Silverthorn data pointMichael S07/25/12 02:45 PM
    Silverthorn data pointDavid Kanter07/25/12 04:06 PM
  New Article: Compute Efficiency 2012Unununium07/25/12 05:55 PM
    New Article: Compute Efficiency 2012EduardoS07/25/12 08:12 PM
      Ops... I'm wrong...EduardoS07/25/12 08:14 PM
  New Article: Compute Efficiency 2012TacoBell07/25/12 08:36 PM
    New Article: Compute Efficiency 2012David Kanter07/25/12 09:49 PM
    New Article: Compute Efficiency 2012Michael S07/26/12 02:33 AM
  Line and factorMoritz07/26/12 01:34 AM
    Line and factorPeter Boyle07/27/12 07:57 AM
      not entirelyMoritz07/27/12 12:22 PM
      Line and factorEduardoS07/27/12 05:24 PM
        Line and factorMoritz07/28/12 12:52 PM
  tables Michael S07/26/12 02:39 AM
  Interlagos L2+L3Rana07/26/12 03:13 AM
    Interlagos L2+L3Rana07/26/12 03:13 AM
    Interlagos L2+L3David Kanter07/26/12 09:21 AM
      SP vs DP & performance metricsjp07/27/12 07:08 AM
        SP vs DP & performance metricsEric07/27/12 07:57 AM
          SP vs DP & performance metricsjp07/27/12 09:18 AM
            SP vs DP & performance metricsaaron spink07/27/12 09:36 AM
              SP vs DP & performance metricsjp07/27/12 09:47 AM
                "Global" --> systemPaul A. Clayton07/27/12 10:31 AM
                  "Global" --> systemjp07/27/12 03:55 PM
                    "Global" --> systemaaron spink07/27/12 07:33 PM
                      "Global" --> systemjp07/28/12 02:00 AM
                        "Global" --> systemaaron spink07/28/12 06:54 AM
                          "Global" --> systemjp07/29/12 02:12 AM
                            "Global" --> systemaaron spink07/29/12 05:03 AM
                              "Global" --> systemnone07/29/12 09:05 AM
                                "Global" --> systemEduardoS07/29/12 10:26 AM
                                "Global" --> systemjp07/30/12 02:24 AM
                                  "Global" --> systemaaron spink07/30/12 03:05 AM
                                "Global" --> systemaaron spink07/30/12 03:03 AM
                                  daxpy is STREAM TRIADPaul A. Clayton07/30/12 06:10 AM
                SP vs DP & performance metricsaaron spink07/27/12 07:25 PM
                  SP vs DP & performance metricsEmil Briggs07/28/12 06:40 AM
                    SP vs DP & performance metricsaaron spink07/28/12 07:05 AM
                      SP vs DP & performance metricsjp07/28/12 11:04 AM
                        SP vs DP & performance metricsBrett07/28/12 03:32 PM
                      SP vs DP & performance metricsEmil Briggs07/28/12 06:11 PM
                        SP vs DP & performance metricsanon07/29/12 02:53 AM
                        SP vs DP & performance metricsaaron spink07/29/12 05:39 AM
                          Coherency for discretesRohit07/29/12 09:24 AM
                          SP vs DP & performance metricsanon07/29/12 11:09 AM
                          SP vs DP & performance metricsEric07/29/12 01:08 PM
        SP vs DP & performance metricsaaron spink07/27/12 09:25 AM
  Regular updates?Joe07/27/12 09:35 AM
  New Article: Compute Efficiency 201230907/27/12 10:34 PM
  New Article: Compute Efficiency 2012Ingeneer07/30/12 09:01 AM
    New Article: Compute Efficiency 2012David Kanter07/30/12 01:11 PM
      New Article: Compute Efficiency 2012Ingeneer07/30/12 08:04 PM
        New Article: Compute Efficiency 2012David Kanter07/30/12 09:32 PM
          Memory power and bandwidth?Iain McClatchie08/03/12 04:35 PM
            Memory power and bandwidth?David Kanter08/04/12 11:22 AM
              Memory power and bandwidth?Michael S08/04/12 02:36 PM
              Memory power and bandwidth?Iain McClatchie08/06/12 02:09 PM
              Memory power and bandwidth?Eric08/07/12 06:28 PM
                WorkloadsDavid Kanter08/08/12 10:49 AM
                  WorkloadsEric08/09/12 05:21 PM
                Latency and bandwidth bottlenecks Paul A. Clayton08/08/12 04:02 PM
                  Latency and bandwidth bottlenecks Eric08/09/12 05:32 PM
                    Latency and bandwidth bottlenecks none08/10/12 06:06 AM
                  Latency and bandwidth bottlenecks -> BDPajensen08/11/12 03:21 PM
            Memory power and bandwidth?Ingeneer08/06/12 11:26 AM
  NV aims for 1.8+ TFLOPS DP ?jp08/11/12 01:21 PM
    NV aims for 1.8+ TFLOPS DP ?David Kanter08/11/12 09:25 PM
      NV aims for 1.8+ TFLOPS DP ?jp08/12/12 02:45 AM
      NV aims for 1.8+ TFLOPS DP ?EBFE08/12/12 10:02 PM
        NV aims for 1.8+ TFLOPS DP ?jp08/13/12 01:54 AM
          NV aims for 1.8+ TFLOPS DP ?Gabriele Svelto08/13/12 09:16 AM
            NV aims for 1.8+ TFLOPS DP ?Vincent Diepeveen08/14/12 03:04 AM
          NV aims for 1.8+ TFLOPS DP ?David Kanter08/13/12 09:50 AM
            NV aims for 1.8+ TFLOPS DP ?jp08/13/12 11:17 AM
        NV aims for 1.8+ TFLOPS DP ?EduardoS08/13/12 06:45 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?