By: RichardC (tich.delete@this.pobox.com), January 24, 2017 6:50 am
Room: Moderated Discussions
Aaron Spink (aaronspink.delete@this.notearthlink.net) on January 23, 2017 7:02 pm wrote:
> That's great, what about the TB+ of main memory?
At current prices of about $7/GB, 1024GB of DRAM is about $7K. In a supercomputer targeting
flock-of-chickens parallelizable apps, you want a large fraction of total system cost to be going
into cpus rather than DRAM. Let's suppose the cost ratio is 1:1. If the cpu is in the smartphone-SoC
range, say $30, then it would be matched with about 4GB DRAM; if it's in the desktop-CPU range, say
$250, then it would be matched with about 32GB DRAM; if it's around $1000,match it with 128GB.
My guess is that what makes sense is to target problems with high cpu requirements, but relatively
low DRAM and low communication: if over 50% of your cost is in the DRAM, then the question of whether you have ARM cores or x86 cores is down in the noise - you're basically buying DRAM. So I'm thinking
the sweet spot for an ARM-based system is probably with cpu chips in the desktop-cpu range of
$250 (with area and transistor count which give high yield), matched with relatively little DRAM,
e.g. 8GB or 16GB. For some problems maybe 4GB is enough.
So you probably have something that looks a lot like a bunch of desktop PCs on a network, but
presumably packaged densely into a rack, with a decent interconnect, and with a multicore-CPU + GPGPU
combination optimized for throughput-per-chip and throughput-per-watt rather than for single-thread
performance.
Taking a wild guess, let's say it's $200 for each cpu, $120 for 16GB DRAM, and $100 for interconnect,
cpu, system overhead, giving $420 per node. Then put together 10K nodes for about $4M. As another
wild guess, suppose we get throughput in the same ballpark as a $200-ish GPU - GTX 1060 is around
4TFLOPS single-precision.
It won't be good for a wide range of applications. But it will be very cost-effective for a few
applications. And it doesn't need the huge-DRAM support.
For what I work on at the moment (data analytics) I totally want the small number of nodes
w/ 1TB+ DRAM. But that's not the best solution for everything.
> That's great, what about the TB+ of main memory?
At current prices of about $7/GB, 1024GB of DRAM is about $7K. In a supercomputer targeting
flock-of-chickens parallelizable apps, you want a large fraction of total system cost to be going
into cpus rather than DRAM. Let's suppose the cost ratio is 1:1. If the cpu is in the smartphone-SoC
range, say $30, then it would be matched with about 4GB DRAM; if it's in the desktop-CPU range, say
$250, then it would be matched with about 32GB DRAM; if it's around $1000,match it with 128GB.
My guess is that what makes sense is to target problems with high cpu requirements, but relatively
low DRAM and low communication: if over 50% of your cost is in the DRAM, then the question of whether you have ARM cores or x86 cores is down in the noise - you're basically buying DRAM. So I'm thinking
the sweet spot for an ARM-based system is probably with cpu chips in the desktop-cpu range of
$250 (with area and transistor count which give high yield), matched with relatively little DRAM,
e.g. 8GB or 16GB. For some problems maybe 4GB is enough.
So you probably have something that looks a lot like a bunch of desktop PCs on a network, but
presumably packaged densely into a rack, with a decent interconnect, and with a multicore-CPU + GPGPU
combination optimized for throughput-per-chip and throughput-per-watt rather than for single-thread
performance.
Taking a wild guess, let's say it's $200 for each cpu, $120 for 16GB DRAM, and $100 for interconnect,
cpu, system overhead, giving $420 per node. Then put together 10K nodes for about $4M. As another
wild guess, suppose we get throughput in the same ballpark as a $200-ish GPU - GTX 1060 is around
4TFLOPS single-precision.
It won't be good for a wide range of applications. But it will be very cost-effective for a few
applications. And it doesn't need the huge-DRAM support.
For what I work on at the moment (data analytics) I totally want the small number of nodes
w/ 1TB+ DRAM. But that's not the best solution for everything.