By: aaron spink (aaronspink.delete@this.notearthlink.net), August 6, 2012 4:33 am
Room: Moderated Discussions
EBFE (x.delete@this.y.com) on August 6, 2012 3:09 am wrote:
> Coding would definitely be easier for KNC than GPU, if your target
> is e.g. 40% efficiency.
> However, suppose big-K is 2TFlops.
> Is it easier to
> code KNC for 90%, than GPU for 50%? (thus same performance)
> The good thing is
> that KNC seems to have more bandwidth, so it might be true.
>
It is unlikely that K20 is >1.5 TFlops. As of right now there are no plans for the GK110 to be put in the consumer space, so they cannot rely on binning good dies for the K product. They also have a thermal envelope to fit into. So it is unlikely they'll be releasing at 1 Ghz+. What we do know about K20 is that it is 15 SMX @ 64 DP per SMX for a best case of 1920 TFlops @ 1 Ghz. Thermals + large die will probably cost them 20-30% frequency putting them in the range of 1.4 TFlops. And even that is probably a bit generous since it is unlikely that they'll be able enable all 15 SMX due to defects.
And yes, it is likely easier to code KNC for 90% than for GPU for 50% on average across the relevant workloads. For one thing, all your code will compile on KNC @ day 1 and KNC offer much more flexibility in the decomposing of a program. KNC provides offload, symmetric, and host models. K20 only provides offload.
> 665GFlops/M2090 is theoretical.
> At launch time, Fermi Linpack is ~56%. So
> it's likely Assuming unchanged efficiency, big-K is also
> around TFlops Linpack.
>
I'll put my stake down at ~800-900 Gflops linpack performance for K20. I think you are assuming too high of a peak for K20. Even my 1.4 Tflop peak is >2x 2090.
> Coding would definitely be easier for KNC than GPU, if your target
> is e.g. 40% efficiency.
> However, suppose big-K is 2TFlops.
> Is it easier to
> code KNC for 90%, than GPU for 50%? (thus same performance)
> The good thing is
> that KNC seems to have more bandwidth, so it might be true.
>
It is unlikely that K20 is >1.5 TFlops. As of right now there are no plans for the GK110 to be put in the consumer space, so they cannot rely on binning good dies for the K product. They also have a thermal envelope to fit into. So it is unlikely they'll be releasing at 1 Ghz+. What we do know about K20 is that it is 15 SMX @ 64 DP per SMX for a best case of 1920 TFlops @ 1 Ghz. Thermals + large die will probably cost them 20-30% frequency putting them in the range of 1.4 TFlops. And even that is probably a bit generous since it is unlikely that they'll be able enable all 15 SMX due to defects.
And yes, it is likely easier to code KNC for 90% than for GPU for 50% on average across the relevant workloads. For one thing, all your code will compile on KNC @ day 1 and KNC offer much more flexibility in the decomposing of a program. KNC provides offload, symmetric, and host models. K20 only provides offload.
> 665GFlops/M2090 is theoretical.
> At launch time, Fermi Linpack is ~56%. So
> it's likely Assuming unchanged efficiency, big-K is also
> around TFlops Linpack.
>
I'll put my stake down at ~800-900 Gflops linpack performance for K20. I think you are assuming too high of a peak for K20. Even my 1.4 Tflop peak is >2x 2090.



