By: Eric (eric.kjellen.delete@this.gmail.com), August 7, 2012 5:22 am
Room: Moderated Discussions
jp (asdfasdf.delete@this.gmail.com) on August 7, 2012 5:17 am wrote:
> Eric (eric.kjellen.delete@this.gmail.com) on August 7, 2012 4:58 am wrote:
> >
> jp (asdfasdf.delete@this.gmail.com) on August 7, 2012 3:08 am wrote:
> > >
> aaron
> > spink (aaronspink.delete@this.notearthlink.net) on August 6, 2012
> 4:33 am
> > >
> > wrote:
> > > > EBFE (x.delete@this.y.com) on
> August 6, 2012 3:09 am
> > wrote:
> > > >
> > >
> > >
> > > Coding would
> > > > definitely be
> > easier for KNC than
> GPU, if your
> > > target
> > > > > is e.g. 40%
> >
> >
> > > efficiency.
> > > > > However, suppose
> > > big-K is
>
> > 2TFlops.
> > > > > Is it easier to
> > > >
> >
> > > > code KNC
> > for
> > > 90%, than GPU for 50%? (thus same
> performance)
> > > > > The
> >
> > > > good thing
> >
> > is
> > > > > that KNC seems to have more
> > bandwidth, so
> it might be
> > > >
> > > true.
> > > > >
> > >
> > It
> > is unlikely that K20 is >1.5 TFlops. As of right
> > >
> now there
> > >
> > > are no plans for the GK110 to be put in the
> consumer space, so
> > > they
> > cannot rely
> > > > on
> binning good dies for the K product. They also have
> >
> > > a
> thermal envelope to
> > > > fit into. So it is unlikely they'll
> be
> > releasing
> > > at 1 Ghz+. What we do know
> > > >
> about K20 is that it is
> > 15 SMX @ 64 DP per
> > > SMX for a best case
> of 1920 TFlops @
> > > > 1
> > Ghz. Thermals + large die will
>
> > > probably cost them 20-30% frequency
> > putting
> > > >
> them in the range of 1.4
> > > TFlops. And even that is
> > probably a
> bit generous since
> > > > it is unlikely
> > > that they'll be
>
> > able enable all 15 SMX due to defects.
> > > >
> > > >
> And
> > > >
> >
> > > yes, it is likely easier to code KNC for
> 90% than for GPU for 50% on
> > average
> > >
> > > > across
> the relevant workloads. For one thing, all
> > your code will compile
> >
> > on KNC
> > > > @ day 1 and KNC offer much more
> > flexibility
> in the decomposing of a
> > > program.
> > > > KNC provides
>
> > offload, symmetric, and host models. K20 only
> > > provides
>
> > > >
> > offload.
> > > >
> > > > >
> 665GFlops/M2090 is theoretical.
> > >
> > >
> > > > At launch
> time, Fermi
> > > > Linpack is ~56%. So
> > >
> > > >
> it's likely
> > > Assuming unchanged efficiency, big-K is
> > >
>
> > > also
> > > > > around TFlops
> > > Linpack.
> >
> > > >
> > >
> > >
> > > > I'll put my stake down
> at ~800-900
> > > >
> > > Gflops
> > linpack performance for
> K20. I think you are assuming too high of a
> > >
> > peak
> > >
> > for K20. Even my 1.4 Tflop peak is >2x 2090.
> > >
> > >
>
> > Just a note on
> > > the performance numbers mentioned before. From
> what I
> > heard KNC does not have
> > > the FMA instruction.
> >
> >
> > > With the 1.09
> > Ghz that would land us a whopping 1.09 *
>
> > > 61 * (512/64 ) = 558 GFLOPS DP,
> > behind the current
> competition, in other words
> > > it looks like KNC would be
> > 2 years
> late to the market.
> > >
> > > On the K20 we can
> > > expect
>
> > conservative clocks at maybe ~ 0.8 Ghz which would put us at 15*64*0.8*2
>
> > >
> > => 1536 GFLOPs DP, way ahead of the competition... Given
> that GPUs are
> >
> > > already hitting 80% for matrix operation
> applications we should be seeing
> > at
> > > least 1.22 TFLOPs by a
> conservative estimate.
> > >
> > > More
> > importantly, for real
>
> > > world applications we should be seeing 15 * 192 * 2
> > * 0.8
> => 4608 SP GFLOPS.
> > > And yes, single precision is extremely
> >
> important within for example image and
> > > signal processing
> >
> applications.
> > >
> > > Cheers,
> > >
> >
> > That's not
> correct, KNC/MIC has FMA
> > support. Look at page 18 in the following PDF
> (at page 14 it also says that DP
> > performance is 1 TFLOPS):
> >
> >
> 2012.04.25 Andrzej Nowak - An overview of Intel MIC
> > - technology, hardware
> and software v3
>
> Thanks for the source! Have been searching for this.
>
You're welcome, I just found it myself. =)
> Eric (eric.kjellen.delete@this.gmail.com) on August 7, 2012 4:58 am wrote:
> >
> jp (asdfasdf.delete@this.gmail.com) on August 7, 2012 3:08 am wrote:
> > >
> aaron
> > spink (aaronspink.delete@this.notearthlink.net) on August 6, 2012
> 4:33 am
> > >
> > wrote:
> > > > EBFE (x.delete@this.y.com) on
> August 6, 2012 3:09 am
> > wrote:
> > > >
> > >
> > >
> > > Coding would
> > > > definitely be
> > easier for KNC than
> GPU, if your
> > > target
> > > > > is e.g. 40%
> >
> >
> > > efficiency.
> > > > > However, suppose
> > > big-K is
>
> > 2TFlops.
> > > > > Is it easier to
> > > >
> >
> > > > code KNC
> > for
> > > 90%, than GPU for 50%? (thus same
> performance)
> > > > > The
> >
> > > > good thing
> >
> > is
> > > > > that KNC seems to have more
> > bandwidth, so
> it might be
> > > >
> > > true.
> > > > >
> > >
> > It
> > is unlikely that K20 is >1.5 TFlops. As of right
> > >
> now there
> > >
> > > are no plans for the GK110 to be put in the
> consumer space, so
> > > they
> > cannot rely
> > > > on
> binning good dies for the K product. They also have
> >
> > > a
> thermal envelope to
> > > > fit into. So it is unlikely they'll
> be
> > releasing
> > > at 1 Ghz+. What we do know
> > > >
> about K20 is that it is
> > 15 SMX @ 64 DP per
> > > SMX for a best case
> of 1920 TFlops @
> > > > 1
> > Ghz. Thermals + large die will
>
> > > probably cost them 20-30% frequency
> > putting
> > > >
> them in the range of 1.4
> > > TFlops. And even that is
> > probably a
> bit generous since
> > > > it is unlikely
> > > that they'll be
>
> > able enable all 15 SMX due to defects.
> > > >
> > > >
> And
> > > >
> >
> > > yes, it is likely easier to code KNC for
> 90% than for GPU for 50% on
> > average
> > >
> > > > across
> the relevant workloads. For one thing, all
> > your code will compile
> >
> > on KNC
> > > > @ day 1 and KNC offer much more
> > flexibility
> in the decomposing of a
> > > program.
> > > > KNC provides
>
> > offload, symmetric, and host models. K20 only
> > > provides
>
> > > >
> > offload.
> > > >
> > > > >
> 665GFlops/M2090 is theoretical.
> > >
> > >
> > > > At launch
> time, Fermi
> > > > Linpack is ~56%. So
> > >
> > > >
> it's likely
> > > Assuming unchanged efficiency, big-K is
> > >
>
> > > also
> > > > > around TFlops
> > > Linpack.
> >
> > > >
> > >
> > >
> > > > I'll put my stake down
> at ~800-900
> > > >
> > > Gflops
> > linpack performance for
> K20. I think you are assuming too high of a
> > >
> > peak
> > >
> > for K20. Even my 1.4 Tflop peak is >2x 2090.
> > >
> > >
>
> > Just a note on
> > > the performance numbers mentioned before. From
> what I
> > heard KNC does not have
> > > the FMA instruction.
> >
> >
> > > With the 1.09
> > Ghz that would land us a whopping 1.09 *
>
> > > 61 * (512/64 ) = 558 GFLOPS DP,
> > behind the current
> competition, in other words
> > > it looks like KNC would be
> > 2 years
> late to the market.
> > >
> > > On the K20 we can
> > > expect
>
> > conservative clocks at maybe ~ 0.8 Ghz which would put us at 15*64*0.8*2
>
> > >
> > => 1536 GFLOPs DP, way ahead of the competition... Given
> that GPUs are
> >
> > > already hitting 80% for matrix operation
> applications we should be seeing
> > at
> > > least 1.22 TFLOPs by a
> conservative estimate.
> > >
> > > More
> > importantly, for real
>
> > > world applications we should be seeing 15 * 192 * 2
> > * 0.8
> => 4608 SP GFLOPS.
> > > And yes, single precision is extremely
> >
> important within for example image and
> > > signal processing
> >
> applications.
> > >
> > > Cheers,
> > >
> >
> > That's not
> correct, KNC/MIC has FMA
> > support. Look at page 18 in the following PDF
> (at page 14 it also says that DP
> > performance is 1 TFLOPS):
> >
> >
> 2012.04.25 Andrzej Nowak - An overview of Intel MIC
> > - technology, hardware
> and software v3
>
> Thanks for the source! Have been searching for this.
>
You're welcome, I just found it myself. =)
Topic | Posted By | Date |
---|---|---|
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 01:37 AM |
New Article: Compute Efficiency 2012 | SHK | 2012/07/25 02:31 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 02:42 AM |
New Article: Compute Efficiency 2012 | none | 2012/07/25 03:18 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:25 AM |
GCN (NT) | EBFE | 2012/07/25 03:25 AM |
GCN - TFLOP DP | jp | 2012/08/09 01:58 PM |
GCN - TFLOP DP | David Kanter | 2012/08/09 03:32 PM |
GCN - TFLOP DP | Kevin G | 2012/08/11 05:22 PM |
GCN - TFLOP DP | Eric | 2012/08/09 05:12 PM |
GCN - TFLOP DP | jp | 2012/08/10 01:23 AM |
GCN - TFLOP DP | EBFE | 2012/08/12 08:27 PM |
GCN - TFLOP DP | jp | 2012/08/13 02:02 AM |
GCN - TFLOP DP | EBFE | 2012/08/13 07:45 PM |
GCN - TFLOP DP | jp | 2012/08/14 01:21 AM |
New Article: Compute Efficiency 2012 | Adrian | 2012/07/25 04:39 AM |
New Article: Compute Efficiency 2012 | EBFE | 2012/07/25 09:33 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:11 AM |
New Article: Compute Efficiency 2012 | sf | 2012/07/25 06:46 AM |
New Article: Compute Efficiency 2012 | aaron spink | 2012/07/25 09:08 AM |
New Article: Compute Efficiency 2012 | someone | 2012/07/25 10:06 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:14 AM |
New Article: Compute Efficiency 2012 | EBFE | 2012/07/26 02:27 AM |
BG/Q | David Kanter | 2012/07/26 09:31 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/03 01:57 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/03 07:59 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/04 06:37 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/04 06:51 PM |
Leaks != products | David Kanter | 2012/08/05 03:19 AM |
Leaks != products | EBFE | 2012/08/06 02:49 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/05 10:37 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/06 03:09 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/06 04:33 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 03:08 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/07 04:58 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 05:17 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/07 05:22 AM |
VR-ZONE KNC B0 leak, poor number? | anonymou5 | 2012/08/07 09:43 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 05:23 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/07 07:24 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/07 07:20 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 11:22 AM |
VR-ZONE KNC B0 leak, poor number? | EduardoS | 2012/08/07 03:15 PM |
KNC has FMA | David Kanter | 2012/08/07 09:17 AM |
New Article: Compute Efficiency 2012 | forestlaughing | 2012/07/25 08:51 AM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 05:12 AM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/27 11:53 AM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 12:51 PM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/27 02:48 PM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 03:29 PM |
New Article: Compute Efficiency 2012 | anon | 2012/07/29 02:25 AM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/29 11:53 AM |
Efficiency? No, lack of highly useful features | someone | 2012/07/25 09:58 AM |
Best case for GPUs | David Kanter | 2012/07/25 11:28 AM |
Best case for GPUs | franzliszt | 2012/07/25 01:39 PM |
Best case for GPUs | Chuck | 2012/07/25 08:13 PM |
Best case for GPUs | David Kanter | 2012/07/25 09:45 PM |
Best case for GPUs | Eric | 2012/07/27 05:51 AM |
Silverthorn data point | Michael S | 2012/07/25 02:45 PM |
Silverthorn data point | David Kanter | 2012/07/25 04:06 PM |
New Article: Compute Efficiency 2012 | Unununium | 2012/07/25 05:55 PM |
New Article: Compute Efficiency 2012 | EduardoS | 2012/07/25 08:12 PM |
Ops... I'm wrong... | EduardoS | 2012/07/25 08:14 PM |
New Article: Compute Efficiency 2012 | TacoBell | 2012/07/25 08:36 PM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 09:49 PM |
New Article: Compute Efficiency 2012 | Michael S | 2012/07/26 02:33 AM |
Line and factor | Moritz | 2012/07/26 01:34 AM |
Line and factor | Peter Boyle | 2012/07/27 07:57 AM |
not entirely | Moritz | 2012/07/27 12:22 PM |
Line and factor | EduardoS | 2012/07/27 05:24 PM |
Line and factor | Moritz | 2012/07/28 12:52 PM |
tables | Michael S | 2012/07/26 02:39 AM |
Interlagos L2+L3 | Rana | 2012/07/26 03:13 AM |
Interlagos L2+L3 | Rana | 2012/07/26 03:13 AM |
Interlagos L2+L3 | David Kanter | 2012/07/26 09:21 AM |
SP vs DP & performance metrics | jp | 2012/07/27 07:08 AM |
SP vs DP & performance metrics | Eric | 2012/07/27 07:57 AM |
SP vs DP & performance metrics | jp | 2012/07/27 09:18 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 09:36 AM |
SP vs DP & performance metrics | jp | 2012/07/27 09:47 AM |
"Global" --> system | Paul A. Clayton | 2012/07/27 10:31 AM |
"Global" --> system | jp | 2012/07/27 03:55 PM |
"Global" --> system | aaron spink | 2012/07/27 07:33 PM |
"Global" --> system | jp | 2012/07/28 02:00 AM |
"Global" --> system | aaron spink | 2012/07/28 06:54 AM |
"Global" --> system | jp | 2012/07/29 02:12 AM |
"Global" --> system | aaron spink | 2012/07/29 05:03 AM |
"Global" --> system | none | 2012/07/29 09:05 AM |
"Global" --> system | EduardoS | 2012/07/29 10:26 AM |
"Global" --> system | jp | 2012/07/30 02:24 AM |
"Global" --> system | aaron spink | 2012/07/30 03:05 AM |
"Global" --> system | aaron spink | 2012/07/30 03:03 AM |
daxpy is STREAM TRIAD | Paul A. Clayton | 2012/07/30 06:10 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 07:25 PM |
SP vs DP & performance metrics | Emil Briggs | 2012/07/28 06:40 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/28 07:05 AM |
SP vs DP & performance metrics | jp | 2012/07/28 11:04 AM |
SP vs DP & performance metrics | Brett | 2012/07/28 03:32 PM |
SP vs DP & performance metrics | Emil Briggs | 2012/07/28 06:11 PM |
SP vs DP & performance metrics | anon | 2012/07/29 02:53 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/29 05:39 AM |
Coherency for discretes | Rohit | 2012/07/29 09:24 AM |
SP vs DP & performance metrics | anon | 2012/07/29 11:09 AM |
SP vs DP & performance metrics | Eric | 2012/07/29 01:08 PM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 09:25 AM |
Regular updates? | Joe | 2012/07/27 09:35 AM |
New Article: Compute Efficiency 2012 | 309 | 2012/07/27 10:34 PM |
New Article: Compute Efficiency 2012 | Ingeneer | 2012/07/30 09:01 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/30 01:11 PM |
New Article: Compute Efficiency 2012 | Ingeneer | 2012/07/30 08:04 PM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/30 09:32 PM |
Memory power and bandwidth? | Iain McClatchie | 2012/08/03 04:35 PM |
Memory power and bandwidth? | David Kanter | 2012/08/04 11:22 AM |
Memory power and bandwidth? | Michael S | 2012/08/04 02:36 PM |
Memory power and bandwidth? | Iain McClatchie | 2012/08/06 02:09 PM |
Memory power and bandwidth? | Eric | 2012/08/07 06:28 PM |
Workloads | David Kanter | 2012/08/08 10:49 AM |
Workloads | Eric | 2012/08/09 05:21 PM |
Latency and bandwidth bottlenecks | Paul A. Clayton | 2012/08/08 04:02 PM |
Latency and bandwidth bottlenecks | Eric | 2012/08/09 05:32 PM |
Latency and bandwidth bottlenecks | none | 2012/08/10 06:06 AM |
Latency and bandwidth bottlenecks -> BDP | ajensen | 2012/08/11 03:21 PM |
Memory power and bandwidth? | Ingeneer | 2012/08/06 11:26 AM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/11 01:21 PM |
NV aims for 1.8+ TFLOPS DP ? | David Kanter | 2012/08/11 09:25 PM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/12 02:45 AM |
NV aims for 1.8+ TFLOPS DP ? | EBFE | 2012/08/12 10:02 PM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/13 01:54 AM |
NV aims for 1.8+ TFLOPS DP ? | Gabriele Svelto | 2012/08/13 09:16 AM |
NV aims for 1.8+ TFLOPS DP ? | Vincent Diepeveen | 2012/08/14 03:04 AM |
NV aims for 1.8+ TFLOPS DP ? | David Kanter | 2012/08/13 09:50 AM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/13 11:17 AM |
NV aims for 1.8+ TFLOPS DP ? | EduardoS | 2012/08/13 06:45 AM |