By: RichardC (tich.delete@this.pobox.com), January 24, 2017 5:06 pm
Room: Moderated Discussions
Aaron Spink (aaronspink.delete@this.notearthlink.net) on January 24, 2017 11:01 am wrote:
> If it isn't a phone/tablet SoC then it has no shared costs with them and will cost as much
> as any Xeon if not more. Even the high end Xeons are off a die that has 1M+ volume.
That's absurd. You pick up an existing core and an existing GPU, both already optimized
for an existing foundry process, and a large amount of existing software infrastructure.
Then you need an ECC DRAM controller, which is not rocket science, and you may already have
one for a server, and either some on-chip support for interconnect, or just some PCIe
lanes. It has a *lot* of shared costs. You've still got to make a mask set, of course,
and verify the parts that are new, if any. But you're way ahead of the game.
> And those computers using Tesla P100s (not 1080s which lack ECC and have poor DP) are connected to
> cpus with 100s of GB of dram. They are constantly stream data in and out of the local memory.
In the last month I've bought one machine with a GTX 1080, and built another one with
an AMD GPU. Both had 16GB of DRAM for the x86. And they solve a pretty hairy problem of
graphics rendering. So there's a counter-example to the "need 100GB". For some things you do (I've got workstation box w/ 128GB), but for some things you don't. Similarly, for some
things you need huge DP throughput, and for others you can get away with SP throughput
(though that probably limits you quite a bit more severely).
>
> Being different is what makes it extremely niche with low volume. That's not the market you
> want to try to make money in, not when you are competing against full featured Xeons, GPUs,
> and Xeon Phi. I highly doubt the new Mont Blanc machine is going to skimp on memory.
Yes, it's probably a niche. But if you can invest $50M to develop a weather-forecasting
machine with 2x better throughput/$ than the alternatives, then there's probably a $500M
market for that alone.
It's a risky thing to do. But so what ? People attempt risky innovative stuff all the
time. And yes, a lot of them fail. The idea isn't obviously bad, it seems to me that if
it's done well it could find a profitable market - and almost by definition, to succeed
in competition against Intel you *need* to be looking for a market that is either a bad
technical match for Intel's technologies, *or* is too small for them to get serious about
it.
> > (e.g. PCIe switch chips). PCIe can also go between boards in a rack, within reason.
> > But maybe you only target applications with sufficiently low communication/compute that
> > 2 x 10Gbit out of a 4U box, or between racks, is enough.
> >
> That's a vanishingly small subset of applications with that low of communication.
> Outside of crypto mining, you are unlikely to ever see it.
I don't think that's true. If you have a 3D CFD model, and each node has an NxNxN
set of cells, then at each timestep it has to update N**3 cells, but only has to
communicate 6*(N**2) across the boundaries. And maybe the updating at the boundaries doesn't need to occur at every timestep.
> So basically, you want to build a pure linpack machine.
*I* don't particularly want to build it. But it seems that some people *are* building it,
so I'm speculating about why they might want to build it and what it might look like.
That seems to be a flaw in your argument that no-one would do it because it can't beat a
bunch of Xeons and Xeon-Phis. Someone *is* doing it. Maybe it's just because they're stupid, and it will fail. Or maybe it's interestingly different, in a way that *does* work for some class of apps.
I've actually been there, in the 90s, building clusters of SPARC nodes with attached vector units and custom interconnect. It was especially tricky back then because the DRAM was
just small.
> If it isn't a phone/tablet SoC then it has no shared costs with them and will cost as much
> as any Xeon if not more. Even the high end Xeons are off a die that has 1M+ volume.
That's absurd. You pick up an existing core and an existing GPU, both already optimized
for an existing foundry process, and a large amount of existing software infrastructure.
Then you need an ECC DRAM controller, which is not rocket science, and you may already have
one for a server, and either some on-chip support for interconnect, or just some PCIe
lanes. It has a *lot* of shared costs. You've still got to make a mask set, of course,
and verify the parts that are new, if any. But you're way ahead of the game.
> And those computers using Tesla P100s (not 1080s which lack ECC and have poor DP) are connected to
> cpus with 100s of GB of dram. They are constantly stream data in and out of the local memory.
In the last month I've bought one machine with a GTX 1080, and built another one with
an AMD GPU. Both had 16GB of DRAM for the x86. And they solve a pretty hairy problem of
graphics rendering. So there's a counter-example to the "need 100GB". For some things you do (I've got workstation box w/ 128GB), but for some things you don't. Similarly, for some
things you need huge DP throughput, and for others you can get away with SP throughput
(though that probably limits you quite a bit more severely).
>
> Being different is what makes it extremely niche with low volume. That's not the market you
> want to try to make money in, not when you are competing against full featured Xeons, GPUs,
> and Xeon Phi. I highly doubt the new Mont Blanc machine is going to skimp on memory.
Yes, it's probably a niche. But if you can invest $50M to develop a weather-forecasting
machine with 2x better throughput/$ than the alternatives, then there's probably a $500M
market for that alone.
It's a risky thing to do. But so what ? People attempt risky innovative stuff all the
time. And yes, a lot of them fail. The idea isn't obviously bad, it seems to me that if
it's done well it could find a profitable market - and almost by definition, to succeed
in competition against Intel you *need* to be looking for a market that is either a bad
technical match for Intel's technologies, *or* is too small for them to get serious about
it.
> > (e.g. PCIe switch chips). PCIe can also go between boards in a rack, within reason.
> > But maybe you only target applications with sufficiently low communication/compute that
> > 2 x 10Gbit out of a 4U box, or between racks, is enough.
> >
> That's a vanishingly small subset of applications with that low of communication.
> Outside of crypto mining, you are unlikely to ever see it.
I don't think that's true. If you have a 3D CFD model, and each node has an NxNxN
set of cells, then at each timestep it has to update N**3 cells, but only has to
communicate 6*(N**2) across the boundaries. And maybe the updating at the boundaries doesn't need to occur at every timestep.
> So basically, you want to build a pure linpack machine.
*I* don't particularly want to build it. But it seems that some people *are* building it,
so I'm speculating about why they might want to build it and what it might look like.
That seems to be a flaw in your argument that no-one would do it because it can't beat a
bunch of Xeons and Xeon-Phis. Someone *is* doing it. Maybe it's just because they're stupid, and it will fail. Or maybe it's interestingly different, in a way that *does* work for some class of apps.
I've actually been there, in the 90s, building clusters of SPARC nodes with attached vector units and custom interconnect. It was especially tricky back then because the DRAM was
just small.