By: Aaron Spink (aaronspink.delete@this.notearthlink.net), January 25, 2017 3:49 pm
Room: Moderated Discussions
RichardC (tich.delete@this.pobox.com) on January 25, 2017 4:26 am wrote:
> > If we are going to use pixar as an example, it is probably
> > worth pointing out that frame resource requirements
> > can run well into the 10s to 100s of GB of data.
>
> So, as I said, 16 or 32GB gets you into the low end of it. And obviously Pixar is doing some of
> the most complex video rendering there is.
>
Actually, the other side of the house now has the more complex render. Hyperion is arguably significantly more complex than prman at this point.
> I don't think that makes any sense. For each frame, you send the (relatively small) code/data
> to build the model, and eventually you get back the frame. Those transfers can be fully overlapped
> with the rendering of the previous frame and the next frame. Maybe they don't choose to do it that
> way because the network transfers take so little time even at 1Gbit (about 0.3sec for a frame) that
> they're not worth worrying about, but they could. It's a problem with extremely low communication/compute
> ratio, independent tasks with no dependencies, and no need for low latency.
>
While the code for a frame may be small, the data can be immense.
> And googling around, I found a description of Pixar's render farm network from 2010 which mentioned
> 300 10Gbit ports and 1500 1Gbit ports, which sounds very much like 1Gbit ports for most of the
> rendering boxes. Maybe they have some shared data on fileserver boxes which need the 10Gbit ?
> Or maybe those are just for the higher-level interconnect between switches. Anyhow,
> this is the creme de la creme, and it is (or recently was) predominantly 1Gbit.
>
Well that was a system installed before 2010 and used 3k AMD CPUs. One would think that they've moved on at least twice by now.
> Well, a lot of stuff these days happens on clusters in the cloud, which is mostly based on dual-Xeon's
> with maybe 20-28 cpu cores, 64-256GB DRAM, and a 10Gbit connection. That gives you a rather low
> ratio of network bandwidth / DRAM size, and it often ends up being sliced up as a bunch of
> 2-core or 4-core VMs each of which then has around 1Gbit/s. So I think you're again ignoring the
> fact that low-bandwidth apps can run in a cheap and easy way on a platform that you're not
> counting as a "supercomputer" or even a Beowulf.
>
Current gen cloud boxes have moved beyond 10gbe to 25/40, fyi.
> > If we are going to use pixar as an example, it is probably
> > worth pointing out that frame resource requirements
> > can run well into the 10s to 100s of GB of data.
>
> So, as I said, 16 or 32GB gets you into the low end of it. And obviously Pixar is doing some of
> the most complex video rendering there is.
>
Actually, the other side of the house now has the more complex render. Hyperion is arguably significantly more complex than prman at this point.
> I don't think that makes any sense. For each frame, you send the (relatively small) code/data
> to build the model, and eventually you get back the frame. Those transfers can be fully overlapped
> with the rendering of the previous frame and the next frame. Maybe they don't choose to do it that
> way because the network transfers take so little time even at 1Gbit (about 0.3sec for a frame) that
> they're not worth worrying about, but they could. It's a problem with extremely low communication/compute
> ratio, independent tasks with no dependencies, and no need for low latency.
>
While the code for a frame may be small, the data can be immense.
> And googling around, I found a description of Pixar's render farm network from 2010 which mentioned
> 300 10Gbit ports and 1500 1Gbit ports, which sounds very much like 1Gbit ports for most of the
> rendering boxes. Maybe they have some shared data on fileserver boxes which need the 10Gbit ?
> Or maybe those are just for the higher-level interconnect between switches. Anyhow,
> this is the creme de la creme, and it is (or recently was) predominantly 1Gbit.
>
Well that was a system installed before 2010 and used 3k AMD CPUs. One would think that they've moved on at least twice by now.
> Well, a lot of stuff these days happens on clusters in the cloud, which is mostly based on dual-Xeon's
> with maybe 20-28 cpu cores, 64-256GB DRAM, and a 10Gbit connection. That gives you a rather low
> ratio of network bandwidth / DRAM size, and it often ends up being sliced up as a bunch of
> 2-core or 4-core VMs each of which then has around 1Gbit/s. So I think you're again ignoring the
> fact that low-bandwidth apps can run in a cheap and easy way on a platform that you're not
> counting as a "supercomputer" or even a Beowulf.
>
Current gen cloud boxes have moved beyond 10gbe to 25/40, fyi.