By: jp (asdasdf.delete@this.gmail.com), August 7, 2012 11:22 am
Room: Moderated Discussions
aaron spink (aaronspink.delete@this.notearthlink.net) on August 7, 2012 7:20 am wrote:
> jp (asdfasdf.delete@this.gmail.com) on August 7, 2012 3:08 am wrote:
>
> >
> Just a note on
> > the performance numbers mentioned before. From what I
> heard KNC does not have
> > the FMA instruction.
> >
> All public data
> confirms that it has FMA.
>
>
> > On the K20 we can
> > expect conservative
> clocks at maybe ~ 0.8 Ghz which would put us at 15*64*0.8*2
> > => 1536
> GFLOPs DP, way ahead of the competition... Given that GPUs are
> > already
> hitting 80% for matrix operation applications we should be seeing at
> >
> least 1.22 TFLOPs by a conservative estimate.
> >
> .8 Ghz will be pushing it.
> K10 has the advantage of not having to deal with large die effects, having a
> much larger volume from which to bin parts, etc, and only reached .75 Ghz. Also
> getting full yield off such a large die is going to be somewhat rare. Rumors
> are that the K20 will not have a full 15 SMXes and will use 1 or more for die
> salvage in order to get enough working parts further lowering its
> performance.
>
Thr rumours say that they already salvaged one to get the 15 SMX:es that they've disclosed here: http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
> > More importantly, for real
> > world applications we
> should be seeing 15 * 192 * 2 * 0.8 => 4608 SP GFLOPS.
> > And yes, single
> precision is extremely important within for example image and
> > signal
> processing applications.
> >
> If you care about SP, you are likely going to
> be using K10 and not K20.
While the K10 is an interesting option it's 192 bit interface is a bit dscouraging while it also lacks A LOT of the new features presented in K20, Hyper-Q, dynamic parallellism, the new GMU, and GPUdirect. Another great feature of the K20 is that it will support up to 255 register/thread (up from 63 on Fermi) which is going to do wonders for BLAS performance! (aswell as your cheerished Linpack performance...)
And actually if I was happy with K10 i would just buy the GTX690 ( K10 is just 690 downclocked) which does run at 0.925 Ghz.
> jp (asdfasdf.delete@this.gmail.com) on August 7, 2012 3:08 am wrote:
>
> >
> Just a note on
> > the performance numbers mentioned before. From what I
> heard KNC does not have
> > the FMA instruction.
> >
> All public data
> confirms that it has FMA.
>
>
> > On the K20 we can
> > expect conservative
> clocks at maybe ~ 0.8 Ghz which would put us at 15*64*0.8*2
> > => 1536
> GFLOPs DP, way ahead of the competition... Given that GPUs are
> > already
> hitting 80% for matrix operation applications we should be seeing at
> >
> least 1.22 TFLOPs by a conservative estimate.
> >
> .8 Ghz will be pushing it.
> K10 has the advantage of not having to deal with large die effects, having a
> much larger volume from which to bin parts, etc, and only reached .75 Ghz. Also
> getting full yield off such a large die is going to be somewhat rare. Rumors
> are that the K20 will not have a full 15 SMXes and will use 1 or more for die
> salvage in order to get enough working parts further lowering its
> performance.
>
Thr rumours say that they already salvaged one to get the 15 SMX:es that they've disclosed here: http://www.nvidia.com/content/PDF/kepler/NVIDIA-Kepler-GK110-Architecture-Whitepaper.pdf
> > More importantly, for real
> > world applications we
> should be seeing 15 * 192 * 2 * 0.8 => 4608 SP GFLOPS.
> > And yes, single
> precision is extremely important within for example image and
> > signal
> processing applications.
> >
> If you care about SP, you are likely going to
> be using K10 and not K20.
While the K10 is an interesting option it's 192 bit interface is a bit dscouraging while it also lacks A LOT of the new features presented in K20, Hyper-Q, dynamic parallellism, the new GMU, and GPUdirect. Another great feature of the K20 is that it will support up to 255 register/thread (up from 63 on Fermi) which is going to do wonders for BLAS performance! (aswell as your cheerished Linpack performance...)
And actually if I was happy with K10 i would just buy the GTX690 ( K10 is just 690 downclocked) which does run at 0.925 Ghz.
Topic | Posted By | Date |
---|---|---|
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 01:37 AM |
New Article: Compute Efficiency 2012 | SHK | 2012/07/25 02:31 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 02:42 AM |
New Article: Compute Efficiency 2012 | none | 2012/07/25 03:18 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:25 AM |
GCN (NT) | EBFE | 2012/07/25 03:25 AM |
GCN - TFLOP DP | jp | 2012/08/09 01:58 PM |
GCN - TFLOP DP | David Kanter | 2012/08/09 03:32 PM |
GCN - TFLOP DP | Kevin G | 2012/08/11 05:22 PM |
GCN - TFLOP DP | Eric | 2012/08/09 05:12 PM |
GCN - TFLOP DP | jp | 2012/08/10 01:23 AM |
GCN - TFLOP DP | EBFE | 2012/08/12 08:27 PM |
GCN - TFLOP DP | jp | 2012/08/13 02:02 AM |
GCN - TFLOP DP | EBFE | 2012/08/13 07:45 PM |
GCN - TFLOP DP | jp | 2012/08/14 01:21 AM |
New Article: Compute Efficiency 2012 | Adrian | 2012/07/25 04:39 AM |
New Article: Compute Efficiency 2012 | EBFE | 2012/07/25 09:33 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:11 AM |
New Article: Compute Efficiency 2012 | sf | 2012/07/25 06:46 AM |
New Article: Compute Efficiency 2012 | aaron spink | 2012/07/25 09:08 AM |
New Article: Compute Efficiency 2012 | someone | 2012/07/25 10:06 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:14 AM |
New Article: Compute Efficiency 2012 | EBFE | 2012/07/26 02:27 AM |
BG/Q | David Kanter | 2012/07/26 09:31 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/03 01:57 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/03 07:59 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/04 06:37 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/04 06:51 PM |
Leaks != products | David Kanter | 2012/08/05 03:19 AM |
Leaks != products | EBFE | 2012/08/06 02:49 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/05 10:37 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/06 03:09 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/06 04:33 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 03:08 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/07 04:58 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 05:17 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/07 05:22 AM |
VR-ZONE KNC B0 leak, poor number? | anonymou5 | 2012/08/07 09:43 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 05:23 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/07 07:24 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/07 07:20 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 11:22 AM |
VR-ZONE KNC B0 leak, poor number? | EduardoS | 2012/08/07 03:15 PM |
KNC has FMA | David Kanter | 2012/08/07 09:17 AM |
New Article: Compute Efficiency 2012 | forestlaughing | 2012/07/25 08:51 AM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 05:12 AM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/27 11:53 AM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 12:51 PM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/27 02:48 PM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 03:29 PM |
New Article: Compute Efficiency 2012 | anon | 2012/07/29 02:25 AM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/29 11:53 AM |
Efficiency? No, lack of highly useful features | someone | 2012/07/25 09:58 AM |
Best case for GPUs | David Kanter | 2012/07/25 11:28 AM |
Best case for GPUs | franzliszt | 2012/07/25 01:39 PM |
Best case for GPUs | Chuck | 2012/07/25 08:13 PM |
Best case for GPUs | David Kanter | 2012/07/25 09:45 PM |
Best case for GPUs | Eric | 2012/07/27 05:51 AM |
Silverthorn data point | Michael S | 2012/07/25 02:45 PM |
Silverthorn data point | David Kanter | 2012/07/25 04:06 PM |
New Article: Compute Efficiency 2012 | Unununium | 2012/07/25 05:55 PM |
New Article: Compute Efficiency 2012 | EduardoS | 2012/07/25 08:12 PM |
Ops... I'm wrong... | EduardoS | 2012/07/25 08:14 PM |
New Article: Compute Efficiency 2012 | TacoBell | 2012/07/25 08:36 PM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 09:49 PM |
New Article: Compute Efficiency 2012 | Michael S | 2012/07/26 02:33 AM |
Line and factor | Moritz | 2012/07/26 01:34 AM |
Line and factor | Peter Boyle | 2012/07/27 07:57 AM |
not entirely | Moritz | 2012/07/27 12:22 PM |
Line and factor | EduardoS | 2012/07/27 05:24 PM |
Line and factor | Moritz | 2012/07/28 12:52 PM |
tables | Michael S | 2012/07/26 02:39 AM |
Interlagos L2+L3 | Rana | 2012/07/26 03:13 AM |
Interlagos L2+L3 | Rana | 2012/07/26 03:13 AM |
Interlagos L2+L3 | David Kanter | 2012/07/26 09:21 AM |
SP vs DP & performance metrics | jp | 2012/07/27 07:08 AM |
SP vs DP & performance metrics | Eric | 2012/07/27 07:57 AM |
SP vs DP & performance metrics | jp | 2012/07/27 09:18 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 09:36 AM |
SP vs DP & performance metrics | jp | 2012/07/27 09:47 AM |
"Global" --> system | Paul A. Clayton | 2012/07/27 10:31 AM |
"Global" --> system | jp | 2012/07/27 03:55 PM |
"Global" --> system | aaron spink | 2012/07/27 07:33 PM |
"Global" --> system | jp | 2012/07/28 02:00 AM |
"Global" --> system | aaron spink | 2012/07/28 06:54 AM |
"Global" --> system | jp | 2012/07/29 02:12 AM |
"Global" --> system | aaron spink | 2012/07/29 05:03 AM |
"Global" --> system | none | 2012/07/29 09:05 AM |
"Global" --> system | EduardoS | 2012/07/29 10:26 AM |
"Global" --> system | jp | 2012/07/30 02:24 AM |
"Global" --> system | aaron spink | 2012/07/30 03:05 AM |
"Global" --> system | aaron spink | 2012/07/30 03:03 AM |
daxpy is STREAM TRIAD | Paul A. Clayton | 2012/07/30 06:10 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 07:25 PM |
SP vs DP & performance metrics | Emil Briggs | 2012/07/28 06:40 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/28 07:05 AM |
SP vs DP & performance metrics | jp | 2012/07/28 11:04 AM |
SP vs DP & performance metrics | Brett | 2012/07/28 03:32 PM |
SP vs DP & performance metrics | Emil Briggs | 2012/07/28 06:11 PM |
SP vs DP & performance metrics | anon | 2012/07/29 02:53 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/29 05:39 AM |
Coherency for discretes | Rohit | 2012/07/29 09:24 AM |
SP vs DP & performance metrics | anon | 2012/07/29 11:09 AM |
SP vs DP & performance metrics | Eric | 2012/07/29 01:08 PM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 09:25 AM |
Regular updates? | Joe | 2012/07/27 09:35 AM |
New Article: Compute Efficiency 2012 | 309 | 2012/07/27 10:34 PM |
New Article: Compute Efficiency 2012 | Ingeneer | 2012/07/30 09:01 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/30 01:11 PM |
New Article: Compute Efficiency 2012 | Ingeneer | 2012/07/30 08:04 PM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/30 09:32 PM |
Memory power and bandwidth? | Iain McClatchie | 2012/08/03 04:35 PM |
Memory power and bandwidth? | David Kanter | 2012/08/04 11:22 AM |
Memory power and bandwidth? | Michael S | 2012/08/04 02:36 PM |
Memory power and bandwidth? | Iain McClatchie | 2012/08/06 02:09 PM |
Memory power and bandwidth? | Eric | 2012/08/07 06:28 PM |
Workloads | David Kanter | 2012/08/08 10:49 AM |
Workloads | Eric | 2012/08/09 05:21 PM |
Latency and bandwidth bottlenecks | Paul A. Clayton | 2012/08/08 04:02 PM |
Latency and bandwidth bottlenecks | Eric | 2012/08/09 05:32 PM |
Latency and bandwidth bottlenecks | none | 2012/08/10 06:06 AM |
Latency and bandwidth bottlenecks -> BDP | ajensen | 2012/08/11 03:21 PM |
Memory power and bandwidth? | Ingeneer | 2012/08/06 11:26 AM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/11 01:21 PM |
NV aims for 1.8+ TFLOPS DP ? | David Kanter | 2012/08/11 09:25 PM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/12 02:45 AM |
NV aims for 1.8+ TFLOPS DP ? | EBFE | 2012/08/12 10:02 PM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/13 01:54 AM |
NV aims for 1.8+ TFLOPS DP ? | Gabriele Svelto | 2012/08/13 09:16 AM |
NV aims for 1.8+ TFLOPS DP ? | Vincent Diepeveen | 2012/08/14 03:04 AM |
NV aims for 1.8+ TFLOPS DP ? | David Kanter | 2012/08/13 09:50 AM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/13 11:17 AM |
NV aims for 1.8+ TFLOPS DP ? | EduardoS | 2012/08/13 06:45 AM |