By: David Kanter (dkanter.delete@this.realworldtech.com), August 5, 2012 3:19 am
Room: Moderated Discussions
> > Why
> > do you think that? Assuming
> that the top clock frequencies can actually be
> > sustained, it's 8 DP FLOP
> (512-bit vector unit) * 2 (FMA?) * 60 * 1.09 GHz =
> > 1.046 TFLOPS. That's
> in line with the 1 DP TFLOPS that Knights Corner was
> > reported to clock in
> at in DGEMM/LINPACK. TDP and RAM look OK.
>
> Linpack TFlops/300W is likely to
> defeat next Firestream on perf and perf/w, or even next Fermi by
> perf.
Leaked numbers don't mean a whole lot, wait till products come (or at least an announcement).
> However, I am assuming the following disadvantage, so I was expecting
> more flops and flops/W.
> 1. Coding for a target performance level is harder for
> KNC than GPU.
> KNC has to close the gap of low raw flops by significantly
> higher effciency, which could be impossible or make coding harder.
I am skeptical that coding for KNC is harder than a GPU. The former has caches and many other niceties of modern architectures.
> 2. KNC is
> more expensive than GPU. (SNB-EP is already near $2k)
As Aaron pointed out, the GPUs that are used for compute workloads are just as expensive.
> 3. On-borad RAM is
> smaller than competition, which may lower real efficiency.
I'm skeptical that's true. Everyone is using GDDR5 and the capacity is dictated by the limits of your interface (i.e. how many channels, clamshell support, etc.).
> 4. SP performance is
> incompetent.
That could be an issue for some workloads.
> 4.5 KNC trails GPU in some GPU-favorable workloads.(e.g. where
> GPUs are used for now)
We don't even have real numbers for KNC...so how can you make that comparison? It's like saying that POWER7 has lower performance in cell phones than an ARM A9. It's true, but it doesn't say anything informative.
DK
> > do you think that? Assuming
> that the top clock frequencies can actually be
> > sustained, it's 8 DP FLOP
> (512-bit vector unit) * 2 (FMA?) * 60 * 1.09 GHz =
> > 1.046 TFLOPS. That's
> in line with the 1 DP TFLOPS that Knights Corner was
> > reported to clock in
> at in DGEMM/LINPACK. TDP and RAM look OK.
>
> Linpack TFlops/300W is likely to
> defeat next Firestream on perf and perf/w, or even next Fermi by
> perf.
Leaked numbers don't mean a whole lot, wait till products come (or at least an announcement).
> However, I am assuming the following disadvantage, so I was expecting
> more flops and flops/W.
> 1. Coding for a target performance level is harder for
> KNC than GPU.
> KNC has to close the gap of low raw flops by significantly
> higher effciency, which could be impossible or make coding harder.
I am skeptical that coding for KNC is harder than a GPU. The former has caches and many other niceties of modern architectures.
> 2. KNC is
> more expensive than GPU. (SNB-EP is already near $2k)
As Aaron pointed out, the GPUs that are used for compute workloads are just as expensive.
> 3. On-borad RAM is
> smaller than competition, which may lower real efficiency.
I'm skeptical that's true. Everyone is using GDDR5 and the capacity is dictated by the limits of your interface (i.e. how many channels, clamshell support, etc.).
> 4. SP performance is
> incompetent.
That could be an issue for some workloads.
> 4.5 KNC trails GPU in some GPU-favorable workloads.(e.g. where
> GPUs are used for now)
We don't even have real numbers for KNC...so how can you make that comparison? It's like saying that POWER7 has lower performance in cell phones than an ARM A9. It's true, but it doesn't say anything informative.
DK



