By: Eric (eric.kjellen.delete@this.gmail.com), August 5, 2012 10:37 am
Room: Moderated Discussions
EBFE (x.delete@this.y.com) on August 4, 2012 6:37 am wrote:
> Eric (eric.kjellen.delete@this.gmail.com) on August 3, 2012 7:59 am wrote:
> >
> EBFE (x.delete@this.y.com) on August 3, 2012 1:57 am wrote:
> > > David
> Kanter
> > (dkanter.delete@this.realworldtech.com) on July 26, 2012 9:31 am
>
> > >
> > wrote:
> > > > > > > What is the
> > >
> > > > >
> > >
> > > > >
> > > currently best
> ratio of
> > > > GPU to
> > > >
> > > CPU?
> > >
> > > >
> > > >
> > > > > That depends on your
> >
> workload.
> > > > >
> > > > >
> > > > >
>
> > > > >
> >
> > > > > > Will
> > > >
> > the best future design be a
> > >
> > >
> > >
> > >
> > > > > BlueGene/Q (for best I/O) driving an
> >
> > > >
> >
> > > >
> > > > > optimum
> > > > number of
> GPUs
> > (for best Compute)?
> > > > > >
> > >
> >
> > > > > If you
> > look at
> > > >
> > > > >
> the chart, it
> > > > >
> > > >
> > should be clear that
> BGQ stands alone and
> > > > doesn't need
> > > >
> > >
>
> > > GPUs. From a power
> > > > > > perspective, it's
> superior
> > to
> > > > all
> > > existing GPUs - and
> >
> > > > it can run a far
> > wider
> > > > > > range
> of
> > >
> > > > workloads.
> > > >
> > > >
>
> > > > > > David
> > > > >
> > > > >
>
> > >
> > BGQ is
> > > > > about the
> > > > same
> as 3GiB HD7970(947G/250W). So
> > BGQ is
> > > quite likely to be
> inferior
> > > > > to
> > > > a 6GiB
> > incoming-gen
>
> > > Firestream.
> > > > > [If I recall correctly, at
> >
> launch time AMD
> > > >
> > > >
> > > > listed 7970
> GPU/board power
> > as 210/250]
> > > >
> > > > I expect
> that the next
> > >
> > > >
> > generation of throughput
> processors (Knights Corner, Tahiti, Kepler) will
> > >
> >
> > >
> > significantly alter the landscape because they represent a jump in
> >
>
> > > >
> > > process.
> > > >
> > > > For AMD
> and Nvidia it is 40nm
> > to 28nm, and for Intel it is
> > > the move
>
> > > > to 22nm.
> > > >
> >
> > > > DK
> > >
> ow">
> > >
> >
> http://vr-zone.com/articles/intel-xeon-phi-b0-stepping--the-knight-in-shin
> >
> >
> > ing-armor-/16871.html
> > > (if not fake) Looks pretty bad: low
> freq, small ram,
> > high
> > > tdp
> > > The number is so bad that
> I tend to think it's fake.
> >
> > Why
> > do you think that? Assuming
> that the top clock frequencies can actually be
> > sustained, it's 8 DP FLOP
> (512-bit vector unit) * 2 (FMA?) * 60 * 1.09 GHz =
> > 1.046 TFLOPS. That's
> in line with the 1 DP TFLOPS that Knights Corner was
> > reported to clock in
> at in DGEMM/LINPACK. TDP and RAM look OK.
>
> Linpack TFlops/300W is likely to
> defeat next Firestream on perf and perf/w, or even next Fermi by
> perf.
>
> However, I am assuming the following disadvantage, so I was expecting
> more flops and flops/W.
> 1. Coding for a target performance level is harder for
> KNC than GPU.
I strongly suspect that the exact opposite is true.
> KNC has to close the gap of low raw flops by significantly
> higher effciency, which could be impossible or make coding harder.
Raw DP FLOPS are much higher than the current competition (Tesla M2090 rates at 665 GFLOPS, presumably in LINPACK, and FireStream 9370 at 528 DP GFLOPS). Tesla K20 (big Kepler) at 28nm will definitely beat KNC though, as far as I know performance targets are at 1.5-2 DP TFLOPS.
> 2. KNC is
> more expensive than GPU. (SNB-EP is already near $2k)
I can't comment on this, but I think that unit price is a low priority parameter for HPC customers.
> 3. On-borad RAM is
> smaller than competition, which may lower real efficiency.
On-board RAM is significantly higher than the current competition, 8GB vs 6GB for Tesla M2090 and 4GB for FireStream 9370.
> 4. SP performance is
> incompetent.
SP performance is irrelevant for HPC applications.
> 4.5 KNC trails GPU in some GPU-favorable workloads.(e.g. where
> GPUs are used for now)
Which ones?
> Eric (eric.kjellen.delete@this.gmail.com) on August 3, 2012 7:59 am wrote:
> >
> EBFE (x.delete@this.y.com) on August 3, 2012 1:57 am wrote:
> > > David
> Kanter
> > (dkanter.delete@this.realworldtech.com) on July 26, 2012 9:31 am
>
> > >
> > wrote:
> > > > > > > What is the
> > >
> > > > >
> > >
> > > > >
> > > currently best
> ratio of
> > > > GPU to
> > > >
> > > CPU?
> > >
> > > >
> > > >
> > > > > That depends on your
> >
> workload.
> > > > >
> > > > >
> > > > >
>
> > > > >
> >
> > > > > > Will
> > > >
> > the best future design be a
> > >
> > >
> > >
> > >
> > > > > BlueGene/Q (for best I/O) driving an
> >
> > > >
> >
> > > >
> > > > > optimum
> > > > number of
> GPUs
> > (for best Compute)?
> > > > > >
> > >
> >
> > > > > If you
> > look at
> > > >
> > > > >
> the chart, it
> > > > >
> > > >
> > should be clear that
> BGQ stands alone and
> > > > doesn't need
> > > >
> > >
>
> > > GPUs. From a power
> > > > > > perspective, it's
> superior
> > to
> > > > all
> > > existing GPUs - and
> >
> > > > it can run a far
> > wider
> > > > > > range
> of
> > >
> > > > workloads.
> > > >
> > > >
>
> > > > > > David
> > > > >
> > > > >
>
> > >
> > BGQ is
> > > > > about the
> > > > same
> as 3GiB HD7970(947G/250W). So
> > BGQ is
> > > quite likely to be
> inferior
> > > > > to
> > > > a 6GiB
> > incoming-gen
>
> > > Firestream.
> > > > > [If I recall correctly, at
> >
> launch time AMD
> > > >
> > > >
> > > > listed 7970
> GPU/board power
> > as 210/250]
> > > >
> > > > I expect
> that the next
> > >
> > > >
> > generation of throughput
> processors (Knights Corner, Tahiti, Kepler) will
> > >
> >
> > >
> > significantly alter the landscape because they represent a jump in
> >
>
> > > >
> > > process.
> > > >
> > > > For AMD
> and Nvidia it is 40nm
> > to 28nm, and for Intel it is
> > > the move
>
> > > > to 22nm.
> > > >
> >
> > > > DK
> > >
> ow">
> > >
> >
> http://vr-zone.com/articles/intel-xeon-phi-b0-stepping--the-knight-in-shin
> >
> >
> > ing-armor-/16871.html
> > > (if not fake) Looks pretty bad: low
> freq, small ram,
> > high
> > > tdp
> > > The number is so bad that
> I tend to think it's fake.
> >
> > Why
> > do you think that? Assuming
> that the top clock frequencies can actually be
> > sustained, it's 8 DP FLOP
> (512-bit vector unit) * 2 (FMA?) * 60 * 1.09 GHz =
> > 1.046 TFLOPS. That's
> in line with the 1 DP TFLOPS that Knights Corner was
> > reported to clock in
> at in DGEMM/LINPACK. TDP and RAM look OK.
>
> Linpack TFlops/300W is likely to
> defeat next Firestream on perf and perf/w, or even next Fermi by
> perf.
>
> However, I am assuming the following disadvantage, so I was expecting
> more flops and flops/W.
> 1. Coding for a target performance level is harder for
> KNC than GPU.
I strongly suspect that the exact opposite is true.
> KNC has to close the gap of low raw flops by significantly
> higher effciency, which could be impossible or make coding harder.
Raw DP FLOPS are much higher than the current competition (Tesla M2090 rates at 665 GFLOPS, presumably in LINPACK, and FireStream 9370 at 528 DP GFLOPS). Tesla K20 (big Kepler) at 28nm will definitely beat KNC though, as far as I know performance targets are at 1.5-2 DP TFLOPS.
> 2. KNC is
> more expensive than GPU. (SNB-EP is already near $2k)
I can't comment on this, but I think that unit price is a low priority parameter for HPC customers.
> 3. On-borad RAM is
> smaller than competition, which may lower real efficiency.
On-board RAM is significantly higher than the current competition, 8GB vs 6GB for Tesla M2090 and 4GB for FireStream 9370.
> 4. SP performance is
> incompetent.
SP performance is irrelevant for HPC applications.
> 4.5 KNC trails GPU in some GPU-favorable workloads.(e.g. where
> GPUs are used for now)
Which ones?
Topic | Posted By | Date |
---|---|---|
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 01:37 AM |
New Article: Compute Efficiency 2012 | SHK | 2012/07/25 02:31 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 02:42 AM |
New Article: Compute Efficiency 2012 | none | 2012/07/25 03:18 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:25 AM |
GCN (NT) | EBFE | 2012/07/25 03:25 AM |
GCN - TFLOP DP | jp | 2012/08/09 01:58 PM |
GCN - TFLOP DP | David Kanter | 2012/08/09 03:32 PM |
GCN - TFLOP DP | Kevin G | 2012/08/11 05:22 PM |
GCN - TFLOP DP | Eric | 2012/08/09 05:12 PM |
GCN - TFLOP DP | jp | 2012/08/10 01:23 AM |
GCN - TFLOP DP | EBFE | 2012/08/12 08:27 PM |
GCN - TFLOP DP | jp | 2012/08/13 02:02 AM |
GCN - TFLOP DP | EBFE | 2012/08/13 07:45 PM |
GCN - TFLOP DP | jp | 2012/08/14 01:21 AM |
New Article: Compute Efficiency 2012 | Adrian | 2012/07/25 04:39 AM |
New Article: Compute Efficiency 2012 | EBFE | 2012/07/25 09:33 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:11 AM |
New Article: Compute Efficiency 2012 | sf | 2012/07/25 06:46 AM |
New Article: Compute Efficiency 2012 | aaron spink | 2012/07/25 09:08 AM |
New Article: Compute Efficiency 2012 | someone | 2012/07/25 10:06 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 11:14 AM |
New Article: Compute Efficiency 2012 | EBFE | 2012/07/26 02:27 AM |
BG/Q | David Kanter | 2012/07/26 09:31 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/03 01:57 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/03 07:59 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/04 06:37 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/04 06:51 PM |
Leaks != products | David Kanter | 2012/08/05 03:19 AM |
Leaks != products | EBFE | 2012/08/06 02:49 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/05 10:37 AM |
VR-ZONE KNC B0 leak, poor number? | EBFE | 2012/08/06 03:09 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/06 04:33 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 03:08 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/07 04:58 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 05:17 AM |
VR-ZONE KNC B0 leak, poor number? | Eric | 2012/08/07 05:22 AM |
VR-ZONE KNC B0 leak, poor number? | anonymou5 | 2012/08/07 09:43 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 05:23 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/07 07:24 AM |
VR-ZONE KNC B0 leak, poor number? | aaron spink | 2012/08/07 07:20 AM |
VR-ZONE KNC B0 leak, poor number? | jp | 2012/08/07 11:22 AM |
VR-ZONE KNC B0 leak, poor number? | EduardoS | 2012/08/07 03:15 PM |
KNC has FMA | David Kanter | 2012/08/07 09:17 AM |
New Article: Compute Efficiency 2012 | forestlaughing | 2012/07/25 08:51 AM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 05:12 AM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/27 11:53 AM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 12:51 PM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/27 02:48 PM |
New Article: Compute Efficiency 2012 | Eric | 2012/07/27 03:29 PM |
New Article: Compute Efficiency 2012 | anon | 2012/07/29 02:25 AM |
New Article: Compute Efficiency 2012 | hobold | 2012/07/29 11:53 AM |
Efficiency? No, lack of highly useful features | someone | 2012/07/25 09:58 AM |
Best case for GPUs | David Kanter | 2012/07/25 11:28 AM |
Best case for GPUs | franzliszt | 2012/07/25 01:39 PM |
Best case for GPUs | Chuck | 2012/07/25 08:13 PM |
Best case for GPUs | David Kanter | 2012/07/25 09:45 PM |
Best case for GPUs | Eric | 2012/07/27 05:51 AM |
Silverthorn data point | Michael S | 2012/07/25 02:45 PM |
Silverthorn data point | David Kanter | 2012/07/25 04:06 PM |
New Article: Compute Efficiency 2012 | Unununium | 2012/07/25 05:55 PM |
New Article: Compute Efficiency 2012 | EduardoS | 2012/07/25 08:12 PM |
Ops... I'm wrong... | EduardoS | 2012/07/25 08:14 PM |
New Article: Compute Efficiency 2012 | TacoBell | 2012/07/25 08:36 PM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/25 09:49 PM |
New Article: Compute Efficiency 2012 | Michael S | 2012/07/26 02:33 AM |
Line and factor | Moritz | 2012/07/26 01:34 AM |
Line and factor | Peter Boyle | 2012/07/27 07:57 AM |
not entirely | Moritz | 2012/07/27 12:22 PM |
Line and factor | EduardoS | 2012/07/27 05:24 PM |
Line and factor | Moritz | 2012/07/28 12:52 PM |
tables | Michael S | 2012/07/26 02:39 AM |
Interlagos L2+L3 | Rana | 2012/07/26 03:13 AM |
Interlagos L2+L3 | Rana | 2012/07/26 03:13 AM |
Interlagos L2+L3 | David Kanter | 2012/07/26 09:21 AM |
SP vs DP & performance metrics | jp | 2012/07/27 07:08 AM |
SP vs DP & performance metrics | Eric | 2012/07/27 07:57 AM |
SP vs DP & performance metrics | jp | 2012/07/27 09:18 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 09:36 AM |
SP vs DP & performance metrics | jp | 2012/07/27 09:47 AM |
"Global" --> system | Paul A. Clayton | 2012/07/27 10:31 AM |
"Global" --> system | jp | 2012/07/27 03:55 PM |
"Global" --> system | aaron spink | 2012/07/27 07:33 PM |
"Global" --> system | jp | 2012/07/28 02:00 AM |
"Global" --> system | aaron spink | 2012/07/28 06:54 AM |
"Global" --> system | jp | 2012/07/29 02:12 AM |
"Global" --> system | aaron spink | 2012/07/29 05:03 AM |
"Global" --> system | none | 2012/07/29 09:05 AM |
"Global" --> system | EduardoS | 2012/07/29 10:26 AM |
"Global" --> system | jp | 2012/07/30 02:24 AM |
"Global" --> system | aaron spink | 2012/07/30 03:05 AM |
"Global" --> system | aaron spink | 2012/07/30 03:03 AM |
daxpy is STREAM TRIAD | Paul A. Clayton | 2012/07/30 06:10 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 07:25 PM |
SP vs DP & performance metrics | Emil Briggs | 2012/07/28 06:40 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/28 07:05 AM |
SP vs DP & performance metrics | jp | 2012/07/28 11:04 AM |
SP vs DP & performance metrics | Brett | 2012/07/28 03:32 PM |
SP vs DP & performance metrics | Emil Briggs | 2012/07/28 06:11 PM |
SP vs DP & performance metrics | anon | 2012/07/29 02:53 AM |
SP vs DP & performance metrics | aaron spink | 2012/07/29 05:39 AM |
Coherency for discretes | Rohit | 2012/07/29 09:24 AM |
SP vs DP & performance metrics | anon | 2012/07/29 11:09 AM |
SP vs DP & performance metrics | Eric | 2012/07/29 01:08 PM |
SP vs DP & performance metrics | aaron spink | 2012/07/27 09:25 AM |
Regular updates? | Joe | 2012/07/27 09:35 AM |
New Article: Compute Efficiency 2012 | 309 | 2012/07/27 10:34 PM |
New Article: Compute Efficiency 2012 | Ingeneer | 2012/07/30 09:01 AM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/30 01:11 PM |
New Article: Compute Efficiency 2012 | Ingeneer | 2012/07/30 08:04 PM |
New Article: Compute Efficiency 2012 | David Kanter | 2012/07/30 09:32 PM |
Memory power and bandwidth? | Iain McClatchie | 2012/08/03 04:35 PM |
Memory power and bandwidth? | David Kanter | 2012/08/04 11:22 AM |
Memory power and bandwidth? | Michael S | 2012/08/04 02:36 PM |
Memory power and bandwidth? | Iain McClatchie | 2012/08/06 02:09 PM |
Memory power and bandwidth? | Eric | 2012/08/07 06:28 PM |
Workloads | David Kanter | 2012/08/08 10:49 AM |
Workloads | Eric | 2012/08/09 05:21 PM |
Latency and bandwidth bottlenecks | Paul A. Clayton | 2012/08/08 04:02 PM |
Latency and bandwidth bottlenecks | Eric | 2012/08/09 05:32 PM |
Latency and bandwidth bottlenecks | none | 2012/08/10 06:06 AM |
Latency and bandwidth bottlenecks -> BDP | ajensen | 2012/08/11 03:21 PM |
Memory power and bandwidth? | Ingeneer | 2012/08/06 11:26 AM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/11 01:21 PM |
NV aims for 1.8+ TFLOPS DP ? | David Kanter | 2012/08/11 09:25 PM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/12 02:45 AM |
NV aims for 1.8+ TFLOPS DP ? | EBFE | 2012/08/12 10:02 PM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/13 01:54 AM |
NV aims for 1.8+ TFLOPS DP ? | Gabriele Svelto | 2012/08/13 09:16 AM |
NV aims for 1.8+ TFLOPS DP ? | Vincent Diepeveen | 2012/08/14 03:04 AM |
NV aims for 1.8+ TFLOPS DP ? | David Kanter | 2012/08/13 09:50 AM |
NV aims for 1.8+ TFLOPS DP ? | jp | 2012/08/13 11:17 AM |
NV aims for 1.8+ TFLOPS DP ? | EduardoS | 2012/08/13 06:45 AM |