Intel's Groveport Platform

By: Brendan (, April 17, 2017 5:35 pm
Michael S ( on April 17, 2017 4:18 pm wrote:
> Brendan ( on April 17, 2017 3:13 pm wrote:
> > > IMHO, it's bloody obvious than if KNL has any chance at all to be competitive against "normal" Xeon
> > > on non-HPC loads then it's *only* when there is a lot of parallel tasks ready all the time. *Much* more
> > > tasks than mere 32 that are needed for full utilization of a pair of hyperthreaded 8-core Xeons.
> >
> > Yes; and it would be extremely foolish to assume that HPC
> > is the only case where there is a lot of parallel tasks.
> But HPC, at least some classes of it, is the one of the few tasks, and likely the most important among them
> by far, where KNL's dual 512-bit SIMD units can be advantageous. Very-high-bandwidth, but not very low latency,
> HMC-alike memory is also advantageous only for relatively small class of non-HPC workloads.

Um, what?

HPC is mostly "same as mainstream, just more of it", and only uses hardware originally intended for mainstream that's been slapped into a different form factor and given special high-speed interconnects. Nothing is actually designed for HPC (beyond those high speed interconnects). The only reason Xeon Phi exists is that Intel felt like recycling shrapnel left over from a "mainstream GPU" project that didn't turn out so well. AVX512 was originally designed for real time rendering, and is a continuation of SIMD that began with MMX (which was also originally intended for "multimedia" and not HPC).

Note that this extends to (e.g.) NVidia recycling GPUs designed for gaming machines, various people recycling ARM cores intended for smartphones, AMD adding extra hyper-transport interconnects to "commodity server" Opterons because Cray begged, etc.

To say that the "re-purposed scraps designed and intended for mainstream uses" are only important for HPC is ludicrous.

> What's the story about snowflakes? The word suddenly became popular...
> Even in Oz, where snow is not the most common, or so I heard.

I really have no idea; but (for pure speculation) I think it may have become more popularity as side-effect of political correctness (replacing older more sexist alternatives).

> > It would also be extremely foolish to assume "parallel tasks ready all the time". For fluctuating
> > loads you want to cope with the peak demand. You don't want a system that crumbles during the "9
> > am office worker rush" because someone decided it'd be overkill for the "overnight lull".
> >
> > > And no, according to Intel's own estimates, even in the best case scenario
> > > (SPECInt2006_base) KNL does not quite match performance of dual Xeon E5 2620
> > > v4, which street price is likely 2.5-3 times lower than Growerport.
> >
> > Are you suggesting that HPC are the only special little snowflakes that use floating
> > point; or that HPC is beaten by dual Xeon E5 2620 for both HPC and non-HPC?
> >
> > - Brendan
> >
> No, i didn't say anything like that. KNL is definitely much faster than dual-2620v4 on vectorizable FP, and
> even somewhat faster (not a lot, 15% or so) at SPECFp2006_rate, probably due to great memory bandwidth.
> But you were talking about workloads that resemble SPECInt_rate, don't you?

I threw a mixed bag of everything out there (from compilers to amateur CPU generated animated movies).

The problem is that "mainstream" (or "non-HPC") is an extremely wide variation of very different things - there's no defining "characteristic load" that applies to everyone and often there isn't a single "characteristic load" that applies to the same person's usage all the time.

Note that this also applies to SPECintRate scores. I'd be willing to bet that KNL does beat dual Xeon E5 2620 for some integer only loads (and not just because of memory bandwidth). For example; I'd be tempted to suspect that there'd be a significant difference between "many unrelated processes with few threads per process" and "one process with many related threads" (due to various scalability problems in software).

- Brendan
