By: RichardC (tich.delete@this.pobox.com), January 22, 2017 7:54 am
Room: Moderated Discussions
David Kanter (dkanter.delete@this.realworldtech.com) on January 20, 2017 2:55 pm wrote:
> One issue I see is that GPUs tend to require even greater performance from the host CPU to keep
> up with Amdahl's Law. That means that a larger number of cores is less attractive compared to
> Power9 and Skylake.
I think there's a usefully large class of scientific-computing applications where the Amdahl's
Law CPU-to-GPU constraint is relaxed as the scale of the problem increases. CFD for weather
prediction is probably one of them: in that case, you don't build a new supercomputer to solve the
same scale of problem in a shorter elapsed time - you build a new supercomputer so that you can use a
finer grid and thus deal with a much larger computational problem in the same elapsed time.
If the amount of scalar/CPU work scales up less then the amount of vector/GPGPU work, then this
allows you to sidestep Amdahl's Law.
I'm agnostic about whether ARM has a chance in some parts of the scientific computing market,
but I do know that people in that business are very willing to expend software effort to tune their
critical code for a hardware platform with good price/performance, so weakness of the ARM software
ecosystem would be less of an obstacle there than in the normal server market. And the huge market
for phone/tablet ARM-based SoCs with low power and integrated GPU's means that a good deal of the
relevant hardware design is already off-the-shelf (the big weakness being the lack of a decent
interconnect fabric, but some of the ARM server efforts have tried to address that).
From 50000ft, there's an obvious attraction to putting together tens of thousands of ARM-based SoCs
with not-terrible cpu's plus capable GPGPUs, already designed for low power, low cost, and small
physical size, into multi-rack scientific supercomputers narrowly aimed at applications which are embarrassingly parallel. The details of how it is done matter a lot; and the particular application
matters a lot; and in competing against x86 you're shooting at a fast-moving target. But I'm not surprised it looks plausible to some people.
> One issue I see is that GPUs tend to require even greater performance from the host CPU to keep
> up with Amdahl's Law. That means that a larger number of cores is less attractive compared to
> Power9 and Skylake.
I think there's a usefully large class of scientific-computing applications where the Amdahl's
Law CPU-to-GPU constraint is relaxed as the scale of the problem increases. CFD for weather
prediction is probably one of them: in that case, you don't build a new supercomputer to solve the
same scale of problem in a shorter elapsed time - you build a new supercomputer so that you can use a
finer grid and thus deal with a much larger computational problem in the same elapsed time.
If the amount of scalar/CPU work scales up less then the amount of vector/GPGPU work, then this
allows you to sidestep Amdahl's Law.
I'm agnostic about whether ARM has a chance in some parts of the scientific computing market,
but I do know that people in that business are very willing to expend software effort to tune their
critical code for a hardware platform with good price/performance, so weakness of the ARM software
ecosystem would be less of an obstacle there than in the normal server market. And the huge market
for phone/tablet ARM-based SoCs with low power and integrated GPU's means that a good deal of the
relevant hardware design is already off-the-shelf (the big weakness being the lack of a decent
interconnect fabric, but some of the ARM server efforts have tried to address that).
From 50000ft, there's an obvious attraction to putting together tens of thousands of ARM-based SoCs
with not-terrible cpu's plus capable GPGPUs, already designed for low power, low cost, and small
physical size, into multi-rack scientific supercomputers narrowly aimed at applications which are embarrassingly parallel. The details of how it is done matter a lot; and the particular application
matters a lot; and in competing against x86 you're shooting at a fast-moving target. But I'm not surprised it looks plausible to some people.