By: Aaron Spink (aaronspink.delete@this.notearthlink.net), January 24, 2017 8:43 pm
Room: Moderated Discussions
RichardC (tich.delete@this.pobox.com) on January 24, 2017 4:06 pm wrote:
> That's absurd. You pick up an existing core and an existing GPU, both already optimized
> for an existing foundry process, and a large amount of existing software infrastructure.
> Then you need an ECC DRAM controller, which is not rocket science, and you may already have
> one for a server, and either some on-chip support for interconnect, or just some PCIe
> lanes. It has a *lot* of shared costs. You've still got to make a mask set, of course,
> and verify the parts that are new, if any. But you're way ahead of the game.
>
You still have all the NRE, all the verification, all the mask cost, all the test wafer cost, all the production cost. The economy of scale isn't much different if you bypass the phone/tablet market completely. Go look at the A1100 and the costs for it...
> > And those computers using Tesla P100s (not 1080s which lack ECC and have poor DP) are connected to
> > cpus with 100s of GB of dram. They are constantly stream data in and out of the local memory.
>
> In the last month I've bought one machine with a GTX 1080, and built another one with
> an AMD GPU. Both had 16GB of DRAM for the x86. And they solve a pretty hairy problem of
> graphics rendering. So there's a counter-example to the "need 100GB". For some things you
> do (I've got workstation box w/ 128GB), but for some things you don't. Similarly, for some
> things you need huge DP throughput, and for others you can get away with SP throughput
> (though that probably limits you quite a bit more severely).
> >
>
That's fine if you want to run a 3D game... But graphics rendering isn't what supers are bought for, they buy separate machines for that.
> I don't think that's true. If you have a 3D CFD model, and each node has an NxNxN
> set of cells, then at each timestep it has to update N**3 cells, but only has to
> communicate 6*(N**2) across the boundaries. And maybe the updating
> at the boundaries doesn't need to occur at every timestep.
>
For CFD, it certainly needs to happen at each timestep.
> > So basically, you want to build a pure linpack machine.
>
> *I* don't particularly want to build it. But it seems that some people *are* building it,
> so I'm speculating about why they might want to build it and what it might look like.
> That seems to be a flaw in your argument that no-one would do it because it can't beat a
> bunch of Xeons and Xeon-Phis. Someone *is* doing it. Maybe it's just because they're stupid, and it
> will fail. Or maybe it's interestingly different, in a way that *does* work for some class of apps.
>
Government entity spends money on a boondoggle, news at 11. Governments spend lots of money on lots of stupid things all the time.
> That's absurd. You pick up an existing core and an existing GPU, both already optimized
> for an existing foundry process, and a large amount of existing software infrastructure.
> Then you need an ECC DRAM controller, which is not rocket science, and you may already have
> one for a server, and either some on-chip support for interconnect, or just some PCIe
> lanes. It has a *lot* of shared costs. You've still got to make a mask set, of course,
> and verify the parts that are new, if any. But you're way ahead of the game.
>
You still have all the NRE, all the verification, all the mask cost, all the test wafer cost, all the production cost. The economy of scale isn't much different if you bypass the phone/tablet market completely. Go look at the A1100 and the costs for it...
> > And those computers using Tesla P100s (not 1080s which lack ECC and have poor DP) are connected to
> > cpus with 100s of GB of dram. They are constantly stream data in and out of the local memory.
>
> In the last month I've bought one machine with a GTX 1080, and built another one with
> an AMD GPU. Both had 16GB of DRAM for the x86. And they solve a pretty hairy problem of
> graphics rendering. So there's a counter-example to the "need 100GB". For some things you
> do (I've got workstation box w/ 128GB), but for some things you don't. Similarly, for some
> things you need huge DP throughput, and for others you can get away with SP throughput
> (though that probably limits you quite a bit more severely).
> >
>
That's fine if you want to run a 3D game... But graphics rendering isn't what supers are bought for, they buy separate machines for that.
> I don't think that's true. If you have a 3D CFD model, and each node has an NxNxN
> set of cells, then at each timestep it has to update N**3 cells, but only has to
> communicate 6*(N**2) across the boundaries. And maybe the updating
> at the boundaries doesn't need to occur at every timestep.
>
For CFD, it certainly needs to happen at each timestep.
> > So basically, you want to build a pure linpack machine.
>
> *I* don't particularly want to build it. But it seems that some people *are* building it,
> so I'm speculating about why they might want to build it and what it might look like.
> That seems to be a flaw in your argument that no-one would do it because it can't beat a
> bunch of Xeons and Xeon-Phis. Someone *is* doing it. Maybe it's just because they're stupid, and it
> will fail. Or maybe it's interestingly different, in a way that *does* work for some class of apps.
>
Government entity spends money on a boondoggle, news at 11. Governments spend lots of money on lots of stupid things all the time.