Supercomputer variant of Kahan quote

Article: Intel's Near-Threshold Voltage Computing and Applications
By: anon (anon.delete@this.anon.com), October 19, 2012 2:27 am
Room: Moderated Discussions
Robert Myers (rbmyersusa.delete@this.gmail.com) on October 18, 2012 1:29 pm wrote:
> anon (anon.delete@this.anon.com) on October 17, 2012 9:02 pm wrote:
> >
> Michael S (already5chosen.delete@this.yahoo.com) on October 17, 2012 6:56 pm
>
>
> > >
> > > According to my understanding, you are talking about
> IBM
> > research
> > > paper from 9 years ago that investigated
> calculation of
> > relatively tiny
> > > volumetric FFT (N=128, total
> dataset = 32 MB).
> > > BG/Q
> > of today is very different
> > >
> machine from BG/L of 2003. Today's tightly
> > coupled 32-node "compute
> drawer" is
> > > almost as big, when measured by
> > FLOPs, caches or
> memories, as 512-node BG/L from
> > > then. But the question is
> > - why
> bother with parallelizing such small data set
> > > over so many loosely
>
> > coupled computing elements?
> > > Is it in same way similar to
>
> > > what you
> > want to do? From one of our previous discussions on
> comp.arch I got the
> > >
> > impression that you are interested in much
> bigger cubes that likely have very
> >
> > > different scaling
> characteristic on BlueGene type of machines. And it's
> > not
> > >
> obvious to me that their scaling characteristics are worse than small
> >
> cube.
> > >
> >
> > They are not. Larger N means the problem is
> inherently more
> > parallel. For example, see:
> >
> >
> http://code.google.com/p/p3dfft/
> >
> > "P3DFFT uses
> > 2D, or pencil,
> decomposition. This overcomes an important limitation to
> > scalability
> inherent in FFT libraries implementing 1D (or slab) decomposition:
> > the
> number of processors/tasks used to run this problem in parallel can be as
> >
> large as N^2, were N is the linear problem size. This approach has shown good
>
> > scalability up to 32,768 cores on Ranger (Sun/AMD at TACC) when
> integrated into
> > a Direct Numerical Simulation (DNS) turbulence
> application (see scaling analysis
> > presentation at Teragrid’08 meeting,
> Las Vegas)."
> >
> > From the linked
> > paper:
> >
> > "This code
> has been run on Ranger at 40963 resolution using 16K cores,
> > with 87%
> strong scaling for a quadrupling of core count from 4K to 16K. Testing
> > at
> large core counts has also been performed on IBM BG/L and CRAY XT4’s at
> >
> other major supercomputing sites, with 98% strong scaling observed between 16K
>
> > and 32K cores on the former."
> >
> > I'm not saying that every
> problem scales well,
> > but it's simply false to claim that HPC is nothing
> but linpack and no real work
> > ever gets done on them, or that it would be
> much more economical to invest all
> > the money in custom CPUs. So the basis
> for the claim that "everybody else is
> > doing it wrong" is already on
> pretty shaky ground.
> >
> I would appreciate it very much if you would stop
> putting words into my mouth. At no point have I ever said what you just
> "quoted" me as saying.

I did not quote you as saying, I paraphrased. But it is pretty much what you're saying. I can quite easily quote you if you need any reminding.

> In another post, I have explained what my real position
> is with respect to these super-gigantic but not-so-super computers.
>
> For one
> thing, I feel like I keep having the same discussion over and over again. I get
> that feeling because I *am* having the same discussion over and over again. The
> only thing that changes is that the numbers change because the microelectronics
> change and because the number of nodes that are being jammed into a
> high-bandwidth connection (a board, a drawer, or a cabinet, or whatever) keeps
> increasing. Those boards, drawers, and cabinets are *exactly* what has been
> discussed on comp.arch as the only arrangement actually capable of doing
> problems that require lots of bandwidth. The instant you get off those boards
> and out into the warehouse, which is where you need to be to get the linpack
> flops that are being advertised, you have the bandwidth problem that I have been
> belly-aching about for now almost a decade.
>
> The discussion runs something
> like this: "What are you talking about? There are these super-duper numbers
> from Supercomputing Whatever, and Blue Gene did just great." When you dig into
> the numbers, you find that, yes, indeed, Blue Gene didn't do as badly scaling to
> a few thousand nodes as the other entrants, but they were *all* running at about
> 10% of the Linpack rate. Since then, I'm sure there have been more
> Supercomputer Whatevers, and an entirely new set of numbers. These benchmarks
> and claims show only that things aren't quite as bad as they were when I first
> flipped over Blue Gene. The advertisement is essentially bait and switch: flops
> and efficiency are for Linpack with the whole machine, FFT's are tested with
> whatever subset of the machine doesn't look too horrible. There *is* no
> exascale computing for anything but nearly embarrassingly parallel calculations.
> I'm tired of these archaeological digs. Yes, the numbers will change. The
> scalability of real problems will slowly get better, but the improvement has
> nothing, nothing, nothing to do with warehouses filled with computer cabinets
> that can perform billions and billions of flops on a problem that I am assured
> is still important, even if it is of no interest to me.

Cough up the numbers, sir. Instead of handwaving, let's just see some of your numbers. Cite some studies or published results.

For example, you claimed (without citing anything of course) that 3d FFTs don't scale well, but at least for some problems, they actually do, as this study shows. I can frame it as a reply to an exact quote of yours just a couple of posts up, if you would like.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New article: Intel's Near-Threshold ComputingDavid Kanter09/18/12 12:26 PM
  Higher SRAM voltage and shared L1Paul A. Clayton09/18/12 02:38 PM
    Higher SRAM voltage and shared L1David Kanter09/18/12 05:20 PM
      Higher SRAM voltage and shared L1Eric09/20/12 10:44 AM
        Higher SRAM voltage and shared L1David Kanter09/20/12 12:24 PM
      Yes, that kind of asynchronousPaul A. Clayton09/20/12 02:53 PM
    Higher SRAM voltage and shared L1somebody09/19/12 09:27 AM
      So micro-turboboost is doubly impracticalPaul A. Clayton09/20/12 02:53 PM
  Big littleDoug S09/18/12 03:04 PM
    Big littleDavid Kanter09/18/12 04:05 PM
    Big littleRicardo B09/19/12 04:06 AM
  New article: Intel's Near-Threshold Computingdefderdar09/18/12 09:39 PM
    New article: Intel's Near-Threshold Computingtarlinian09/19/12 08:32 AM
      New article: Intel's Near-Threshold ComputingDavid Kanter09/19/12 10:44 AM
  New article: Intel's Near-Threshold ComputingMark Christiansen09/19/12 11:31 AM
    New article: Intel's Near-Threshold ComputingChris Brodersen09/19/12 12:54 PM
  New article: Intel's Near-Threshold ComputingEric09/20/12 10:47 AM
  Latency and HPC WorkloadsRobert Myers10/03/12 10:52 AM
    Latency and HPC Workloadsanon10/03/12 06:50 PM
      Latency and HPC WorkloadsRobert Myers10/04/12 10:24 AM
        Latency and HPC WorkloadsSHK10/08/12 05:42 AM
          Latency and HPC WorkloadsMichael S10/08/12 01:59 PM
            Latency and HPC WorkloadsSHK10/08/12 02:42 PM
              Latency and HPC WorkloadsMichael S10/08/12 05:12 PM
                Latency and HPC Workloadsforestlaughing10/15/12 08:41 AM
                  The original context was Micron RLDRAM (NT)Michael S10/15/12 08:55 AM
                    The original context was Micron RLDRAMforestlaughing10/15/12 10:21 AM
              Latency and HPC Workloads - Why not SRAM?Kevin G10/09/12 09:48 AM
                Latency and HPC Workloads - Why not SRAM?Michael S10/09/12 10:33 AM
                  Latency and HPC Workloads - Why not SRAM?SHK10/09/12 12:55 PM
                    Why not SRAM? - CapacityRohit10/09/12 09:13 PM
                  Latency and HPC Workloads - Why not SRAM?Kevin G10/09/12 03:04 PM
                    Latency and HPC Workloads - Why not SRAM?Michael S10/09/12 04:52 PM
                      Latency and HPC Workloads - Why not SRAM?Robert Myers10/10/12 10:11 AM
                        Latency and HPC Workloads - Why not SRAM?forestlaughing10/15/12 08:02 AM
                          Latency and HPC Workloads - Why not SRAM?Robert Myers10/15/12 09:04 AM
                            Latency and HPC Workloads - Why not SRAM?forestlaughing10/16/12 09:13 AM
                          Latency and HPC Workloads - Why not SRAM?SHK10/16/12 08:12 AM
                    Latency and HPC Workloads - Why not SRAM?slacker10/11/12 01:35 PM
                      SRAM leakageDavid Kanter10/11/12 03:00 PM
          Latency and HPC Workloadsforestlaughing10/15/12 08:57 AM
            Latency and HPC WorkloadsRobert Myers10/16/12 07:28 AM
              Latency and HPC WorkloadsMichael S10/16/12 07:35 AM
              Latency and HPC Workloadsanon10/16/12 08:17 AM
                Latency and HPC WorkloadsRobert Myers10/16/12 09:56 AM
                  Supercomputer variant of Kahan quotePaul A. Clayton10/16/12 11:09 AM
                    Supercomputer variant of Kahan quoteanon10/17/12 01:17 AM
                      Supercomputer variant of Kahan quoteRobert Myers10/17/12 04:34 AM
                        Supercomputer variant of Kahan quoteanon10/17/12 05:12 AM
                          Supercomputer variant of Kahan quoteRobert Myers10/17/12 02:38 PM
                            Supercomputer variant of Kahan quoteanon10/17/12 05:24 PM
                              Supercomputer variant of Kahan quoteRobert Myers10/17/12 05:45 PM
                                Supercomputer variant of Kahan quoteanon10/17/12 05:58 PM
                                Supercomputer variant of Kahan quoteanon10/17/12 05:58 PM
                                  Supercomputer variant of Kahan quoteRobert Myers10/17/12 07:14 PM
                                    Supercomputer variant of Kahan quoteanon10/17/12 08:36 PM
                                      Supercomputer variant of Kahan quoteRobert Myers10/18/12 09:47 AM
                                        Supercomputer variant of Kahan quoteanon10/19/12 02:34 AM
                                          Supercomputer variant of Kahan quoteanon10/19/12 04:47 AM
                                          Supercomputer variant of Kahan quoteRobert Myers10/19/12 03:14 PM
                        Supercomputer variant of Kahan quoteMichael S10/17/12 06:56 PM
                          Supercomputer variant of Kahan quoteanon10/17/12 09:02 PM
                            Supercomputer variant of Kahan quoteRobert Myers10/18/12 01:29 PM
                              Supercomputer variant of Kahan quoteanon10/19/12 02:27 AM
                                Supercomputer variant of Kahan quoteRobert Myers10/19/12 07:24 AM
                                  Supercomputer variant of Kahan quoteanon10/19/12 08:00 AM
                                    Supercomputer variant of Kahan quoteRobert Myers10/19/12 09:28 AM
                                      Supercomputer variant of Kahan quoteanon10/19/12 10:27 AM
                              Supercomputer variant of Kahan quoteforestlaughing10/19/12 10:26 AM
                                Supercomputer variant of Kahan quoteRobert Myers10/19/12 07:04 PM
                                  Supercomputer variant of Kahan quoteEmil Briggs10/20/12 04:52 AM
                                    Supercomputer variant of Kahan quoteRobert Myers10/20/12 07:51 AM
                                      Supercomputer variant of Kahan quoteEmil Briggs10/20/12 08:33 AM
                                        Supercomputer variant of Kahan quoteEmil Briggs10/20/12 08:34 AM
                                          Supercomputer variant of Kahan quoteRobert Myers10/20/12 09:35 AM
                                            Supercomputer variant of Kahan quoteEmil Briggs10/20/12 10:04 AM
                                              Supercomputer variant of Kahan quoteRobert Myers10/20/12 11:23 AM
                  Latency and HPC Workloadsanon10/16/12 06:48 PM
                    Latency and HPC Workloadsforestlaughing10/19/12 11:43 AM
              Latency and HPC Workloadsforestlaughing10/19/12 09:38 AM
                Latency and HPC WorkloadsRobert Myers10/19/12 11:40 AM
                Potential false economics in researchPaul A. Clayton10/19/12 12:54 PM
                  Potential false economics in researchVincent Diepeveen10/20/12 08:59 AM
                  Potential false economics in researchforestlaughing10/23/12 10:56 AM
                    Potential false economics in researchRobert Myers10/23/12 07:16 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell blue?