By: aaron spink (aaronspink.delete@this.notearthlink.net), August 7, 2012 7:20 am
Room: Moderated Discussions
jp (asdfasdf.delete@this.gmail.com) on August 7, 2012 3:08 am wrote:
> Just a note on
> the performance numbers mentioned before. From what I heard KNC does not have
> the FMA instruction.
>
All public data confirms that it has FMA.
> On the K20 we can
> expect conservative clocks at maybe ~ 0.8 Ghz which would put us at 15*64*0.8*2
> => 1536 GFLOPs DP, way ahead of the competition... Given that GPUs are
> already hitting 80% for matrix operation applications we should be seeing at
> least 1.22 TFLOPs by a conservative estimate.
>
.8 Ghz will be pushing it. K10 has the advantage of not having to deal with large die effects, having a much larger volume from which to bin parts, etc, and only reached .75 Ghz. Also getting full yield off such a large die is going to be somewhat rare. Rumors are that the K20 will not have a full 15 SMXes and will use 1 or more for die salvage in order to get enough working parts further lowering its performance.
> More importantly, for real
> world applications we should be seeing 15 * 192 * 2 * 0.8 => 4608 SP GFLOPS.
> And yes, single precision is extremely important within for example image and
> signal processing applications.
>
If you care about SP, you are likely going to be using K10 and not K20.
> Just a note on
> the performance numbers mentioned before. From what I heard KNC does not have
> the FMA instruction.
>
All public data confirms that it has FMA.
> On the K20 we can
> expect conservative clocks at maybe ~ 0.8 Ghz which would put us at 15*64*0.8*2
> => 1536 GFLOPs DP, way ahead of the competition... Given that GPUs are
> already hitting 80% for matrix operation applications we should be seeing at
> least 1.22 TFLOPs by a conservative estimate.
>
.8 Ghz will be pushing it. K10 has the advantage of not having to deal with large die effects, having a much larger volume from which to bin parts, etc, and only reached .75 Ghz. Also getting full yield off such a large die is going to be somewhat rare. Rumors are that the K20 will not have a full 15 SMXes and will use 1 or more for die salvage in order to get enough working parts further lowering its performance.
> More importantly, for real
> world applications we should be seeing 15 * 192 * 2 * 0.8 => 4608 SP GFLOPS.
> And yes, single precision is extremely important within for example image and
> signal processing applications.
>
If you care about SP, you are likely going to be using K10 and not K20.



