By: Emil Briggs (me.delete@this.nowherespam.com), July 28, 2012 6:40 am
Room: Moderated Discussions
aaron spink (aaronspink.delete@this.notearthlink.net) on July 27, 2012 7:25 pm wrote:
>
> PRACE data. ORNL data. In many cases in real
> world workloads, it cannot even maintain perf/w parity with equiv CPUs. TTS is
> horrid because of the fragmented and complex programming model.
>
> Here's
> reality: GPUs struggle to hit 50% efficiency in LINPACK. LINPACK!
>
By ORNL do you mean Oak Ridge National Laboratory? I ask since I am a large user at ORNL. Currently only some of the Jaguar/Titan nodes are equipped with GPU's but there are enough of them installed to do realistic evaluations. For certain workloads (and when properly programmed) they beat CPU's pretty handily performance wise. And that data comes from a real world application not LINPACK. The section of the code that I adapted for GPU's runs 3 to 4 times faster than it does on CPU's. It's also possible with this particular application to overlap some operations on the CPU and GPU and hide the latency of PCI-E data transfers. Obviously not all applications can benefit in the same way and it's not easy to do so even when possible but GPU's can offer some very nice performance gains in some cases.
That being said I do think that the cost of moving data across the PCI-E bus and the difficulty of the programming model are some real downsides to GPU's. How all that plays out will be interesting and I'm looking forward to getting my hands on some Intel MIC hardware to see what we can do with it.
>
> PRACE data. ORNL data. In many cases in real
> world workloads, it cannot even maintain perf/w parity with equiv CPUs. TTS is
> horrid because of the fragmented and complex programming model.
>
> Here's
> reality: GPUs struggle to hit 50% efficiency in LINPACK. LINPACK!
>
By ORNL do you mean Oak Ridge National Laboratory? I ask since I am a large user at ORNL. Currently only some of the Jaguar/Titan nodes are equipped with GPU's but there are enough of them installed to do realistic evaluations. For certain workloads (and when properly programmed) they beat CPU's pretty handily performance wise. And that data comes from a real world application not LINPACK. The section of the code that I adapted for GPU's runs 3 to 4 times faster than it does on CPU's. It's also possible with this particular application to overlap some operations on the CPU and GPU and hide the latency of PCI-E data transfers. Obviously not all applications can benefit in the same way and it's not easy to do so even when possible but GPU's can offer some very nice performance gains in some cases.
That being said I do think that the cost of moving data across the PCI-E bus and the difficulty of the programming model are some real downsides to GPU's. How all that plays out will be interesting and I'm looking forward to getting my hands on some Intel MIC hardware to see what we can do with it.



