By: EBFE (x.delete@this.y.com), August 4, 2012 6:37 am
Room: Moderated Discussions
Eric (eric.kjellen.delete@this.gmail.com) on August 3, 2012 7:59 am wrote:
> EBFE (x.delete@this.y.com) on August 3, 2012 1:57 am wrote:
> > David Kanter
> (dkanter.delete@this.realworldtech.com) on July 26, 2012 9:31 am
> >
> wrote:
> > > > > > What is the
> > > > > >
> >
> > > >
> > currently best ratio of
> > > GPU to
> > >
> > CPU?
> > > > >
> > >
> > > > That depends on your
> workload.
> > > >
> > > >
> > > >
> > > >
>
> > > > > Will
> > > > the best future design be a
> >
> >
> >
> > > > > > BlueGene/Q (for best I/O) driving an
>
> > > >
> > >
> > > > optimum
> > > number of GPUs
> (for best Compute)?
> > > > >
> >
> > > > > If you
> look at
> > >
> > > > the chart, it
> > > >
> > >
> should be clear that BGQ stands alone and
> > > doesn't need
> > >
> >
> > GPUs. From a power
> > > > > perspective, it's superior
> to
> > > all
> > existing GPUs - and
> > > > it can run a far
> wider
> > > > > range of
> >
> > > workloads.
> > >
> > >
> > > > > David
> > > >
> > > >
> >
> BGQ is
> > > > about the
> > > same as 3GiB HD7970(947G/250W). So
> BGQ is
> > quite likely to be inferior
> > > > to
> > > a 6GiB
> incoming-gen
> > Firestream.
> > > > [If I recall correctly, at
> launch time AMD
> > >
> > >
> > > listed 7970 GPU/board power
> as 210/250]
> > >
> > > I expect that the next
> >
> > >
> generation of throughput processors (Knights Corner, Tahiti, Kepler) will
> >
>
> > > significantly alter the landscape because they represent a jump in
>
> > >
> > process.
> > >
> > > For AMD and Nvidia it is 40nm
> to 28nm, and for Intel it is
> > the move
> > > to 22nm.
> > >
>
> > > DK
> > ow">
> >
> http://vr-zone.com/articles/intel-xeon-phi-b0-stepping--the-knight-in-shin
> >
> ing-armor-/16871.html
> > (if not fake) Looks pretty bad: low freq, small ram,
> high
> > tdp
> > The number is so bad that I tend to think it's fake.
>
> Why
> do you think that? Assuming that the top clock frequencies can actually be
> sustained, it's 8 DP FLOP (512-bit vector unit) * 2 (FMA?) * 60 * 1.09 GHz =
> 1.046 TFLOPS. That's in line with the 1 DP TFLOPS that Knights Corner was
> reported to clock in at in DGEMM/LINPACK. TDP and RAM look OK.
Linpack TFlops/300W is likely to defeat next Firestream on perf and perf/w, or even next Fermi by perf.
However, I am assuming the following disadvantage, so I was expecting more flops and flops/W.
1. Coding for a target performance level is harder for KNC than GPU.
KNC has to close the gap of low raw flops by significantly higher effciency, which could be impossible or make coding harder.
2. KNC is more expensive than GPU. (SNB-EP is already near $2k)
3. On-borad RAM is smaller than competition, which may lower real efficiency.
4. SP performance is incompetent.
4.5 KNC trails GPU in some GPU-favorable workloads.(e.g. where GPUs are used for now)
> EBFE (x.delete@this.y.com) on August 3, 2012 1:57 am wrote:
> > David Kanter
> (dkanter.delete@this.realworldtech.com) on July 26, 2012 9:31 am
> >
> wrote:
> > > > > > What is the
> > > > > >
> >
> > > >
> > currently best ratio of
> > > GPU to
> > >
> > CPU?
> > > > >
> > >
> > > > That depends on your
> workload.
> > > >
> > > >
> > > >
> > > >
>
> > > > > Will
> > > > the best future design be a
> >
> >
> >
> > > > > > BlueGene/Q (for best I/O) driving an
>
> > > >
> > >
> > > > optimum
> > > number of GPUs
> (for best Compute)?
> > > > >
> >
> > > > > If you
> look at
> > >
> > > > the chart, it
> > > >
> > >
> should be clear that BGQ stands alone and
> > > doesn't need
> > >
> >
> > GPUs. From a power
> > > > > perspective, it's superior
> to
> > > all
> > existing GPUs - and
> > > > it can run a far
> wider
> > > > > range of
> >
> > > workloads.
> > >
> > >
> > > > > David
> > > >
> > > >
> >
> BGQ is
> > > > about the
> > > same as 3GiB HD7970(947G/250W). So
> BGQ is
> > quite likely to be inferior
> > > > to
> > > a 6GiB
> incoming-gen
> > Firestream.
> > > > [If I recall correctly, at
> launch time AMD
> > >
> > >
> > > listed 7970 GPU/board power
> as 210/250]
> > >
> > > I expect that the next
> >
> > >
> generation of throughput processors (Knights Corner, Tahiti, Kepler) will
> >
>
> > > significantly alter the landscape because they represent a jump in
>
> > >
> > process.
> > >
> > > For AMD and Nvidia it is 40nm
> to 28nm, and for Intel it is
> > the move
> > > to 22nm.
> > >
>
> > > DK
> > ow">
> >
> http://vr-zone.com/articles/intel-xeon-phi-b0-stepping--the-knight-in-shin
> >
> ing-armor-/16871.html
> > (if not fake) Looks pretty bad: low freq, small ram,
> high
> > tdp
> > The number is so bad that I tend to think it's fake.
>
> Why
> do you think that? Assuming that the top clock frequencies can actually be
> sustained, it's 8 DP FLOP (512-bit vector unit) * 2 (FMA?) * 60 * 1.09 GHz =
> 1.046 TFLOPS. That's in line with the 1 DP TFLOPS that Knights Corner was
> reported to clock in at in DGEMM/LINPACK. TDP and RAM look OK.
Linpack TFlops/300W is likely to defeat next Firestream on perf and perf/w, or even next Fermi by perf.
However, I am assuming the following disadvantage, so I was expecting more flops and flops/W.
1. Coding for a target performance level is harder for KNC than GPU.
KNC has to close the gap of low raw flops by significantly higher effciency, which could be impossible or make coding harder.
2. KNC is more expensive than GPU. (SNB-EP is already near $2k)
3. On-borad RAM is smaller than competition, which may lower real efficiency.
4. SP performance is incompetent.
4.5 KNC trails GPU in some GPU-favorable workloads.(e.g. where GPUs are used for now)



