By: Michael S (already5chosen.delete@this.yahoo.com), November 11, 2014 9:23 am
Room: Moderated Discussions
Ronald Maas (rmaas.delete@this.wiwo.nl) on November 11, 2014 8:20 am wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on November 11, 2014 2:19 am wrote:
> > > I suspect that these Xeons are used primarily to feed data to and from the GPUs, and that
> > > the DP FP compute part is entirely done on the GPUs side. And APM X-Gene is designed to
> > > perform these kinds of tasks very well. I bet if I replace the Xeons with 3-4 XG-1 chips
> > > in any of these supercomputers, nobody will notice a difference performance wise.
> > >
> >
> > 4 times more ports in IB switch. May be, Linpack scores will
> > be the same, but power consumption and price will go up.
> > If everything was as easy as you're saying, Green500 will be full of Xeon-E3 based machines. Xeon E3-1240L
> > v3 (25W) can drive single K20x and single FDR IB port just
> > as well as X-GENE, if not better. However, in reality
> > Xeon-E3 has no presence at all in top 200 of Green500 list. Similarly,
> > there is no presence for other efficient-on-paper
> > chips, like Jaguar-based AMD Opteron X1150 and Silvermont-based Intel Atom C2730/C2750.
>
> Xeon E5 and XG1 both can address vastly more memory compared to Xeon E3. And have the necessary bandwidth - 4
> memory channels - to move lots of data very quickly. I believe these characteristics are important for HPC.
>
> Ronald
Huge memory capacity of Xeon-E5 is important for other applications, but for HPC capacity of Xeon-E3 is probably sufficient. Taking as example, TSUBAME-KFC that you mentioned earlier, it has 2,560 GB of RAM = 64 GB per node = 32 GB per socket = 16 GB per GPU. Xeon-E3 can address 32 GB per socket.
As to bandwidth, if we build system with 1 GPU per Xeon-E3 then it will have the same theoretical peak CPU-side bandwidth per GPU as TSUBAME-KFC, but slightly higher practically achievable bandwidth, due to lower latency of unbuffered DIMMs.
I think, that it's not memory capacity or bandwidth that keeps Xeon-E3 away from Top500, but what I said in my previous post - bigger nodes are simply more economical.
BTW, you say that XG1 can address vastly more memory compared to Xeon E3. I don't know if it is true in general, but XG1-based HP ProLiant m400 supports only 64 GB, which is twice more than Xeon-E3 and AMD Opteron X1150, the same as Intel Atom C2730 and many times less than even the most castrated Xeon-E5.
> Michael S (already5chosen.delete@this.yahoo.com) on November 11, 2014 2:19 am wrote:
> > > I suspect that these Xeons are used primarily to feed data to and from the GPUs, and that
> > > the DP FP compute part is entirely done on the GPUs side. And APM X-Gene is designed to
> > > perform these kinds of tasks very well. I bet if I replace the Xeons with 3-4 XG-1 chips
> > > in any of these supercomputers, nobody will notice a difference performance wise.
> > >
> >
> > 4 times more ports in IB switch. May be, Linpack scores will
> > be the same, but power consumption and price will go up.
> > If everything was as easy as you're saying, Green500 will be full of Xeon-E3 based machines. Xeon E3-1240L
> > v3 (25W) can drive single K20x and single FDR IB port just
> > as well as X-GENE, if not better. However, in reality
> > Xeon-E3 has no presence at all in top 200 of Green500 list. Similarly,
> > there is no presence for other efficient-on-paper
> > chips, like Jaguar-based AMD Opteron X1150 and Silvermont-based Intel Atom C2730/C2750.
>
> Xeon E5 and XG1 both can address vastly more memory compared to Xeon E3. And have the necessary bandwidth - 4
> memory channels - to move lots of data very quickly. I believe these characteristics are important for HPC.
>
> Ronald
Huge memory capacity of Xeon-E5 is important for other applications, but for HPC capacity of Xeon-E3 is probably sufficient. Taking as example, TSUBAME-KFC that you mentioned earlier, it has 2,560 GB of RAM = 64 GB per node = 32 GB per socket = 16 GB per GPU. Xeon-E3 can address 32 GB per socket.
As to bandwidth, if we build system with 1 GPU per Xeon-E3 then it will have the same theoretical peak CPU-side bandwidth per GPU as TSUBAME-KFC, but slightly higher practically achievable bandwidth, due to lower latency of unbuffered DIMMs.
I think, that it's not memory capacity or bandwidth that keeps Xeon-E3 away from Top500, but what I said in my previous post - bigger nodes are simply more economical.
BTW, you say that XG1 can address vastly more memory compared to Xeon E3. I don't know if it is true in general, but XG1-based HP ProLiant m400 supports only 64 GB, which is twice more than Xeon-E3 and AMD Opteron X1150, the same as Intel Atom C2730 and many times less than even the most castrated Xeon-E5.