By: aaron spink (aaronspink.delete@this.notearthlink.net), July 28, 2012 7:05 am
Room: Moderated Discussions
Emil Briggs (me.delete@this.nowherespam.com) on July 28, 2012 6:40 am wrote:
> By ORNL do you mean Oak Ridge National Laboratory? I ask
> since I am a large user at ORNL. Currently only some of the Jaguar/Titan nodes
> are equipped with GPU's but there are enough of them installed to do realistic
> evaluations. For certain workloads (and when properly programmed) they beat
> CPU's pretty handily performance wise. And that data comes from a real world
> application not LINPACK. The section of the code that I adapted for GPU's runs 3
> to 4 times faster than it does on CPU's. It's also possible with this particular
> application to overlap some operations on the CPU and GPU and hide the latency
> of PCI-E data transfers. Obviously not all applications can benefit in the same
> way and it's not easy to do so even when possible but GPU's can offer some very
> nice performance gains in some cases.
>
I'm not denying that there are some workloads and some kernels that have an advantage, I am however saying that it is generally the exception based on data published from both PRACE and ORNL, et al.
BTW, what % of peak are you seeing on the GPUs with your code?
> That being said I do think that the
> cost of moving data across the PCI-E bus and the difficulty of the programming
> model are some real downsides to GPU's. How all that plays out will be
> interesting and I'm looking forward to getting my hands on some Intel MIC
> hardware to see what we can do with it.
>
MIC still has the disadvantage of PCI-E being a limiter but it should have a better programming model overall than GPUs. Hopefully we'll eventually see x32/x40 PCI-E interfaces or dual QPI interfaces. Certainly would be nice if the GPUs/MICs had simple coherent access to memory.
> By ORNL do you mean Oak Ridge National Laboratory? I ask
> since I am a large user at ORNL. Currently only some of the Jaguar/Titan nodes
> are equipped with GPU's but there are enough of them installed to do realistic
> evaluations. For certain workloads (and when properly programmed) they beat
> CPU's pretty handily performance wise. And that data comes from a real world
> application not LINPACK. The section of the code that I adapted for GPU's runs 3
> to 4 times faster than it does on CPU's. It's also possible with this particular
> application to overlap some operations on the CPU and GPU and hide the latency
> of PCI-E data transfers. Obviously not all applications can benefit in the same
> way and it's not easy to do so even when possible but GPU's can offer some very
> nice performance gains in some cases.
>
I'm not denying that there are some workloads and some kernels that have an advantage, I am however saying that it is generally the exception based on data published from both PRACE and ORNL, et al.
BTW, what % of peak are you seeing on the GPUs with your code?
> That being said I do think that the
> cost of moving data across the PCI-E bus and the difficulty of the programming
> model are some real downsides to GPU's. How all that plays out will be
> interesting and I'm looking forward to getting my hands on some Intel MIC
> hardware to see what we can do with it.
>
MIC still has the disadvantage of PCI-E being a limiter but it should have a better programming model overall than GPUs. Hopefully we'll eventually see x32/x40 PCI-E interfaces or dual QPI interfaces. Certainly would be nice if the GPUs/MICs had simple coherent access to memory.



