By: Aaron Spink (aaronspink.delete@this.notearthlink.net), January 24, 2017 8:54 pm
Room: Moderated Discussions
RichardC (tich.delete@this.pobox.com) on January 24, 2017 5:08 pm wrote:
> Rendering high-quality 4K video is probably another reasonable app. Each frame might be
> 3840*2160*3*10bit ~ 29.7MB. You need enough DRAM on each node for the 3D model (games
> show that you can get fairly complex in 16GB). Pixar averages 3 hours/frame, but some
> > frames take 8 hours. At that average rate of 3 hours/frame, you need network bandwidth
> of 29.7MB/(3*3600) = 2884 bytes/sec.
>
If we are going to use pixar as an example, it is probably worth pointing out that frame resource requirements can run well into the 10s to 100s of GB of data. And that their parallelization method is separate frames. While the average network utilization is low, the peak requirements are incredibly high. If you are just looking at the end result bandwidth, its basically nothing, but that isn't the limiting bandwidth case. Professional cinematic video rendering is not a low bandwidth case.
> That doesn't count as a supercomputer app because it runs on a cluster of workstation-class
> machines. And I think that's part of what's going: you're thinking about the apps that run
> on supercomputer-class systems. And then you're saying because *those* apps need a lot of
> interconnect bandwidth, it's useless to build a system without a fast/expensive interconnect.
> But that's back to front: there are other apps that run on cluster-of-workstation systems
> (e.g. Beowulf clusters), and don't run on "supercomputers" precisely because the "supercomputer"
> system is expensive overkill for the app. But if you can build a
> flock-of-chickens or flock-of-turkeys system that actually gives more throughput-per-$
> and throughput-per-watt than a Beowulf/cluster-of-workstations, then it can be quite useful,
> even if it isn't a "supercomputer".
>
Even Beowulf systems these days are using all the network they can. The days of making clusters with 1gbe have long since past. 1gbe was barely adequate back in the days of PPros and since then the per socket performance has scaled significantly.
> Rendering high-quality 4K video is probably another reasonable app. Each frame might be
> 3840*2160*3*10bit ~ 29.7MB. You need enough DRAM on each node for the 3D model (games
> show that you can get fairly complex in 16GB). Pixar averages 3 hours/frame, but some
> > frames take 8 hours. At that average rate of 3 hours/frame, you need network bandwidth
> of 29.7MB/(3*3600) = 2884 bytes/sec.
>
If we are going to use pixar as an example, it is probably worth pointing out that frame resource requirements can run well into the 10s to 100s of GB of data. And that their parallelization method is separate frames. While the average network utilization is low, the peak requirements are incredibly high. If you are just looking at the end result bandwidth, its basically nothing, but that isn't the limiting bandwidth case. Professional cinematic video rendering is not a low bandwidth case.
> That doesn't count as a supercomputer app because it runs on a cluster of workstation-class
> machines. And I think that's part of what's going: you're thinking about the apps that run
> on supercomputer-class systems. And then you're saying because *those* apps need a lot of
> interconnect bandwidth, it's useless to build a system without a fast/expensive interconnect.
> But that's back to front: there are other apps that run on cluster-of-workstation systems
> (e.g. Beowulf clusters), and don't run on "supercomputers" precisely because the "supercomputer"
> system is expensive overkill for the app. But if you can build a
> flock-of-chickens or flock-of-turkeys system that actually gives more throughput-per-$
> and throughput-per-watt than a Beowulf/cluster-of-workstations, then it can be quite useful,
> even if it isn't a "supercomputer".
>
Even Beowulf systems these days are using all the network they can. The days of making clusters with 1gbe have long since past. 1gbe was barely adequate back in the days of PPros and since then the per socket performance has scaled significantly.