By: RichardC (tich.delete@this.pobox.com), January 23, 2017 7:57 am
Room: Moderated Discussions
Gabriele Svelto (gabriele.svelto.delete@this.gmail.com) on January 23, 2017 1:03 am wrote:
> RichardC (tich.delete@this.pobox.com) on January 22, 2017 9:45 pm wrote:
> > And I'm also assuming that such a machine would be optimized for embarrassingly-parallel
> > apps which work ok with a large number of small(ish)-DRAM nodes, e.g. 8-16GB per node.
> > If your problem has unfavorable communication/compute ratio when split across many small
> > nodes, a smaller number of large-DRAM x86's is better. But I think CFD is a niche where
> > the flock-of-chickens approach can work.
>
> If a workload is amenable to the flock-of-chickens approach then there's a good
> chance it runs fine on GPUs (unless it has wildly divergent control-flow).
Well, it gets tricky. To keep a big GPU busy you'd like to have about 10K independent
tasks; but then on a scalable supercomputer with 1000+ nodes that means you need >10M
independent tasks. Some things might be like that; others aren't. There are a lot of
big machines like that; I'm presuming the reason why people are exploring ARM-based
architectures is not that they're absolutely better for everything, but that there's some
class of applications (e.g. CFD) where the different costs and different performance-balance
might be better.
And reading through the link, the purpose of the Isambard system is precisely to have a
range of different hardware architectures underneath a common software interface, so that
researchers can easily experiment with running their code on different hardware, to explore
these price-performance tradeoffs.
> RichardC (tich.delete@this.pobox.com) on January 22, 2017 9:45 pm wrote:
> > And I'm also assuming that such a machine would be optimized for embarrassingly-parallel
> > apps which work ok with a large number of small(ish)-DRAM nodes, e.g. 8-16GB per node.
> > If your problem has unfavorable communication/compute ratio when split across many small
> > nodes, a smaller number of large-DRAM x86's is better. But I think CFD is a niche where
> > the flock-of-chickens approach can work.
>
> If a workload is amenable to the flock-of-chickens approach then there's a good
> chance it runs fine on GPUs (unless it has wildly divergent control-flow).
Well, it gets tricky. To keep a big GPU busy you'd like to have about 10K independent
tasks; but then on a scalable supercomputer with 1000+ nodes that means you need >10M
independent tasks. Some things might be like that; others aren't. There are a lot of
big machines like that; I'm presuming the reason why people are exploring ARM-based
architectures is not that they're absolutely better for everything, but that there's some
class of applications (e.g. CFD) where the different costs and different performance-balance
might be better.
And reading through the link, the purpose of the Isambard system is precisely to have a
range of different hardware architectures underneath a common software interface, so that
researchers can easily experiment with running their code on different hardware, to explore
these price-performance tradeoffs.