By: jp (asdasdf.delete@this.gmail.com), August 7, 2012 11:22 am
Room: Moderated Discussions
aaron spink (aaronspink.delete@this.notearthlink.net) on August 7, 2012 7:20 am wrote:
> jp (asdfasdf.delete@this.gmail.com) on August 7, 2012 3:08 am wrote:
>
> >
> Just a note on
> > the performance numbers mentioned before. From what I
> heard KNC does not have
> > the FMA instruction.
> >
> All public data
> confirms that it has FMA.
>
>
> > On the K20 we can
> > expect conservative
> clocks at maybe ~ 0.8 Ghz which would put us at 15*64*0.8*2
> > => 1536
> GFLOPs DP, way ahead of the competition... Given that GPUs are
> > already
> hitting 80% for matrix operation applications we should be seeing at
> >
> least 1.22 TFLOPs by a conservative estimate.
> >
> .8 Ghz will be pushing it.
> K10 has the advantage of not having to deal with large die effects, having a
> much larger volume from which to bin parts, etc, and only reached .75 Ghz. Also
> getting full yield off such a large die is going to be somewhat rare. Rumors
> are that the K20 will not have a full 15 SMXes and will use 1 or more for die
> salvage in order to get enough working parts further lowering its
> performance.
>
Thr rumours say that they already salvaged one to get the 15 SMX:es that they've disclosed here: http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
> > More importantly, for real
> > world applications we
> should be seeing 15 * 192 * 2 * 0.8 => 4608 SP GFLOPS.
> > And yes, single
> precision is extremely important within for example image and
> > signal
> processing applications.
> >
> If you care about SP, you are likely going to
> be using K10 and not K20.
While the K10 is an interesting option it's 192 bit interface is a bit dscouraging while it also lacks A LOT of the new features presented in K20, Hyper-Q, dynamic parallellism, the new GMU, and GPUdirect. Another great feature of the K20 is that it will support up to 255 register/thread (up from 63 on Fermi) which is going to do wonders for BLAS performance! (aswell as your cheerished Linpack performance...)
And actually if I was happy with K10 i would just buy the GTX690 ( K10 is just 690 downclocked) which does run at 0.925 Ghz.
> jp (asdfasdf.delete@this.gmail.com) on August 7, 2012 3:08 am wrote:
>
> >
> Just a note on
> > the performance numbers mentioned before. From what I
> heard KNC does not have
> > the FMA instruction.
> >
> All public data
> confirms that it has FMA.
>
>
> > On the K20 we can
> > expect conservative
> clocks at maybe ~ 0.8 Ghz which would put us at 15*64*0.8*2
> > => 1536
> GFLOPs DP, way ahead of the competition... Given that GPUs are
> > already
> hitting 80% for matrix operation applications we should be seeing at
> >
> least 1.22 TFLOPs by a conservative estimate.
> >
> .8 Ghz will be pushing it.
> K10 has the advantage of not having to deal with large die effects, having a
> much larger volume from which to bin parts, etc, and only reached .75 Ghz. Also
> getting full yield off such a large die is going to be somewhat rare. Rumors
> are that the K20 will not have a full 15 SMXes and will use 1 or more for die
> salvage in order to get enough working parts further lowering its
> performance.
>
Thr rumours say that they already salvaged one to get the 15 SMX:es that they've disclosed here: http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
> > More importantly, for real
> > world applications we
> should be seeing 15 * 192 * 2 * 0.8 => 4608 SP GFLOPS.
> > And yes, single
> precision is extremely important within for example image and
> > signal
> processing applications.
> >
> If you care about SP, you are likely going to
> be using K10 and not K20.
While the K10 is an interesting option it's 192 bit interface is a bit dscouraging while it also lacks A LOT of the new features presented in K20, Hyper-Q, dynamic parallellism, the new GMU, and GPUdirect. Another great feature of the K20 is that it will support up to 255 register/thread (up from 63 on Fermi) which is going to do wonders for BLAS performance! (aswell as your cheerished Linpack performance...)
And actually if I was happy with K10 i would just buy the GTX690 ( K10 is just 690 downclocked) which does run at 0.925 Ghz.



