By: forestlaughing (forestlaughing.delete@this.yahoo.com), October 19, 2012 10:26 am
Room: Moderated Discussions
Robert Myers (rbmyersusa.delete@this.gmail.com) on October 18, 2012 1:29 pm wrote:
> I would appreciate it very much if you would stop
> putting words into my mouth. At no point have I ever said what you just
> "quoted" me as saying.
Because that is a rhetorical technique you reserve only for yourself????
> For one
> thing, I feel like I keep having the same discussion over and over again. I get
...
> The discussion runs something
> like this: "What are you talking about? There are these super-duper numbers
> from Supercomputing Whatever, and Blue Gene did just great." When you dig into
> the numbers, you find that, yes, indeed, Blue Gene didn't do as badly scaling to
> a few thousand nodes as the other entrants, but they were *all* running at about
> 10% of the Linpack rate. Since then, I'm sure there have been more
> Supercomputer Whatevers, and an entirely new set of numbers. These benchmarks
> and claims show only that things aren't quite as bad as they were when I first
> flipped over Blue Gene. The advertisement is essentially bait and switch: flops
> and efficiency are for Linpack with the whole machine, FFT's are tested with
> whatever subset of the machine doesn't look too horrible. There *is* no
> exascale computing for anything but nearly embarrassingly parallel calculations.
> I'm tired of these archaeological digs. Yes, the numbers will change. The
> scalability of real problems will slowly get better, but the improvement has
> nothing, nothing, nothing to do with warehouses filled with computer cabinets
> that can perform billions and billions of flops on a problem that I am assured
> is still important, even if it is of no interest to me.
Well you have answered your own complaint. Machine bandwidth IS getting better. Signalling rates per link keep improving, RDMA networks are providing a larger share of advertised bandwidth to applications, programming models and libraries have improved the delivered signalling rate, new network topologies have been devised that deliver good bisection bandwidth, even to very large machines. Individual nodes are getting more powerful, so volume/area effects mean the need for bandwidth is going down. In short, all of the trends are positive for modern MPPs, as compared to older MPPs. They're just not advancing nearly so fast as linpack flops.
Forgive me if I misinterpret things, but is your complaint that real world applications can't achieve the same levels of performance as does linpack? Is this just an advertising problem? Well, that's not a very new complaint. The whole industry has been frustrated by that for decades. Linpack is an artificial number, that is only very loosely connected to real performance. Everyone in the industry knows that. Even a handful of journalist seem to understand that.
Everyone knows that FFTs don't scale across hero-sized supercomputers. These machines are not bought to run 16k-core FFTs. They are bought to run other applications. There do exist high bandwidth machines, that will run FFTs much better. You can get a IBM's largest power-processor SMP box, and that will give dozens of cores with thousands of GB/s bandwidth amongst them (or HP superdome, or sun M-something-thousand). The press isn't going to call it a supercomputer, but they don't call google's datacenter a supercomputer either, even though it is super-good at solving a particular problem.
> I would appreciate it very much if you would stop
> putting words into my mouth. At no point have I ever said what you just
> "quoted" me as saying.
Because that is a rhetorical technique you reserve only for yourself????
> For one
> thing, I feel like I keep having the same discussion over and over again. I get
...
> The discussion runs something
> like this: "What are you talking about? There are these super-duper numbers
> from Supercomputing Whatever, and Blue Gene did just great." When you dig into
> the numbers, you find that, yes, indeed, Blue Gene didn't do as badly scaling to
> a few thousand nodes as the other entrants, but they were *all* running at about
> 10% of the Linpack rate. Since then, I'm sure there have been more
> Supercomputer Whatevers, and an entirely new set of numbers. These benchmarks
> and claims show only that things aren't quite as bad as they were when I first
> flipped over Blue Gene. The advertisement is essentially bait and switch: flops
> and efficiency are for Linpack with the whole machine, FFT's are tested with
> whatever subset of the machine doesn't look too horrible. There *is* no
> exascale computing for anything but nearly embarrassingly parallel calculations.
> I'm tired of these archaeological digs. Yes, the numbers will change. The
> scalability of real problems will slowly get better, but the improvement has
> nothing, nothing, nothing to do with warehouses filled with computer cabinets
> that can perform billions and billions of flops on a problem that I am assured
> is still important, even if it is of no interest to me.
Well you have answered your own complaint. Machine bandwidth IS getting better. Signalling rates per link keep improving, RDMA networks are providing a larger share of advertised bandwidth to applications, programming models and libraries have improved the delivered signalling rate, new network topologies have been devised that deliver good bisection bandwidth, even to very large machines. Individual nodes are getting more powerful, so volume/area effects mean the need for bandwidth is going down. In short, all of the trends are positive for modern MPPs, as compared to older MPPs. They're just not advancing nearly so fast as linpack flops.
Forgive me if I misinterpret things, but is your complaint that real world applications can't achieve the same levels of performance as does linpack? Is this just an advertising problem? Well, that's not a very new complaint. The whole industry has been frustrated by that for decades. Linpack is an artificial number, that is only very loosely connected to real performance. Everyone in the industry knows that. Even a handful of journalist seem to understand that.
Everyone knows that FFTs don't scale across hero-sized supercomputers. These machines are not bought to run 16k-core FFTs. They are bought to run other applications. There do exist high bandwidth machines, that will run FFTs much better. You can get a IBM's largest power-processor SMP box, and that will give dozens of cores with thousands of GB/s bandwidth amongst them (or HP superdome, or sun M-something-thousand). The press isn't going to call it a supercomputer, but they don't call google's datacenter a supercomputer either, even though it is super-good at solving a particular problem.



