Intel's Groveport Platform

By: anon (, April 18, 2017 3:32 am
Room: Moderated Discussions
Brendan ( on April 17, 2017 11:48 pm wrote:
> Hi,
> anon ( on April 17, 2017 10:39 pm wrote:
> > Brendan ( on April 17, 2017 8:31 pm wrote:
> > > anon ( on April 17, 2017 6:32 pm wrote:
> > > > Brendan ( on April 17, 2017 5:35 pm wrote:
> > > > > Michael S ( on April 17, 2017 4:18 pm wrote:
> > > > > > Brendan ( on April 17, 2017 3:13 pm wrote:
> > > > > > > > IMHO, it's bloody obvious than if KNL has any chance at all to be competitive against "normal" Xeon
> > > > > > > > on non-HPC loads then it's *only* when there is a lot of parallel tasks ready all the time. *Much* more
> > > > > > > > tasks than mere 32 that are needed for full utilization of a pair of hyperthreaded 8-core Xeons.
> > > > > > >
> > > > > > > Yes; and it would be extremely foolish to assume that HPC
> > > > > > > is the only case where there is a lot of parallel tasks.
> > > > > >
> > > > > > But HPC, at least some classes of it, is the one of the few tasks, and likely the most important among them
> > > > > > by far, where KNL's dual 512-bit SIMD units can be advantageous.
> > > > > > Very-high-bandwidth, but not very low latency,
> > > > > > HMC-alike memory is also advantageous only for relatively small class of non-HPC workloads.
> > > > >
> > > > > Um, what?
> > > > >
> > > > > HPC is mostly "same as mainstream, just more of it",
> > > >
> > > > It isn't really. Unless you define mainstream so broadly it doesn't mean anything.
> > >
> > > How else can "mainstream" be defined?
> >
> > Probably wrong wording on my behalf. Doesn't matter how broad or narrow you define mainstream, HPC
> > is never "mostly same as mainstream, just more of it". You can reasonably define it so regular vectorizable
> > parallel DFLOPS is a subset of mainstream, but that doesn't make HPC the same as it.
> If I use one standard Xeon machine to do some number crunching, then that's (part of) mainstream;
> and if I use 2 standard Xeon machines that's also (part of) mainstream; but if I add a third
> standard Xeon machine then it becomes HPC and the work each machine is doing suddenly completely
> and utterly different? Yes, no, maybe? How many machines do I have to add before I hit the magic
> "suddenly completely and utterly different" cut-off point? Is the answer 42?

Don't really know what you're on about. Sarcastic rhetorical questions are a nasty blight that's infected internet forum discussions in recent years. If you can't state your point or disagree with mine without asking a handful of snide questions, then maybe it's best left unsaid.

"number crunching" in general is not HPC. HPC is number crunching though. Not the most precise term, but usually means almost all useful work being done by quite small instruction kernels operating on large floating point data sets.

> > > > > > No, i didn't say anything like that. KNL is definitely much faster than dual-2620v4 on vectorizable FP, and
> > > > > > even somewhat faster (not a lot, 15% or so) at SPECFp2006_rate, probably due to great memory bandwidth.
> > > > > > But you were talking about workloads that resemble SPECInt_rate, don't you?
> > > > >
> > > > > I threw a mixed bag of everything out there (from compilers to amateur CPU generated animated movies).
> > > >
> > > > None of those you listed would want a Xeon Phi.
> > > >
> > > > Not an office with 40 thin clients could use Xeon Phi.
> > >
> > > Depends what the office workers actually do.
> >
> > Yes, but you implied a general thin client server and presumably associated network and
> > storage processing, which Phi is *not* suited to. If they also have significant FP compute
> > requirement, then what is it? That is what's relevant, not the thin client part.
> >
> > >
> > > > Not a game developer who wants to support MCDRAM and AVX512 (they will wait at least until
> > > > it is in high end consumer stuff, if they are serious developers and want to support it
> > > > before that, Intel will provide engineering samples 6 months or so before release).
> > >
> > > High end consumer stuff?
> >
> > Yes.
> >
> > > High bandwidth on-chip RAM
> >
> > Doesn't need specific support beyond what's mostly already there. Some
> > tuning perhaps, which obviously is not the same from Phi to a PC CPU.
> Some minor tuning; like completely redesigning memory management to make the most
> effective use of a new type of limited resource that hasn't existed before.

No, not at all. If the memory comes in the form of transparent fast caches, or faster main memory, no real work required. Very fine tuning unlikely but maybe, but not of the type that it would be useful to put effort into tuning on Xeon Phi because that won't behave anything like some hypothetical future laptop CPU you don't even know about yet. If it's software managed, then there is already precedent for that. If that doesn't fit with existing GPU/GPGPU interfaces, then it would be pretty unlikely for a company to "completely redesign memory management" because they are making this kind of wild speculations that you are. But if they wanted to redesign it to allow for such thing, there is no need to have any real hardware for it, you can just allocate some limited memory and pretend it's faster for the purpose of initial design. No need for Phi.

> > > and AVX512 will probably
> > > both be in entry level notebooks by the end of the year. It takes years to
> > > produce a game engine - "6 months sooner" is about 2 years too late.
> >
> > It does not take years to port already vectorized codepaths new vector instructions,
> > or to use updated libraries or compilers. Also games all come out with patches now.
> I'm talking about fundamental design decisions that effect everything, not some low performance Lua script
> changes. If I have some C++ code with 123 classes (and various numbers of objects per class); which classes
> should/shouldn't use high bandwidth RAM? Should some classes be split into "frequently accessed" and "less
> frequently accessed"? If I'm designing a file format for something (levels, meshes, whatever) how much padding
> for alignment should I use? Is it better to use 32-byte alignment and let scatter-gather deal with it for
> AVX512, or should I have unnecessary padding for the AVX2 case? When I create a patch; will the compiler's
> "auto-vetorization" change the file format used by files that I already shipped on read-only DVDs?

That's just not how development really works. And certainly not how game development works. If it's not in today's high end CPU, game development won't bother. Really. Certainly not "redesigning the whole thing fundamentally".

> > And if they want to use the instructions they can likely already get Xeon engineering samples which would
> > be much better to work with. I can't remember the lead time that I got engineering samples from Intel back
> > a few years ago when I was working on an open source project they took an interest in. Probably 3-6 months,
> > but I was absolute bottom tier. Serious partners and ISVs could get first tapeout samples I'm sure.
> How would a person get on the list for next generation Xeon Phi (I've heard they'll
> support virtualisation, and that Intel has early engineering samples already)?

I'm not really involved with them now, but if you were in a position to, you would probably know. Game development companies have relationships with CPU and GPU vendors. If you wanted to do some specific work on upcoming Intel feature with some significant game or popular application, they would likely accommodate. Probably there are formal channels. When I got some engineering samples (of few generations of Xeons), I was just working on open source project which some Intel devs also did some work on and I think one just asked if I wanted some new CPUs. Or maybe I asked them. Can't remember. Then they sent some NDAs and kept sending over Xeons every few years until I asked them to stop.

I'm sure there are some more formal channels to go through as well.

> > > > Not software developers that want to use Xeon Phi as "single server compile farm".
> > >
> > > Many separate processes (with near zero scalability problems between them) with compilers that can
> > > be memory bandwidth sensitive? Throw in some compile-time calculation for some large arrays and a few
> > > "floating point heavy" units tests? I wouldn't assume Xeon Phi couldn't be beneficial for some.
> >
> > I doubt it really, compared with a bunch of cheap low end Xeons. And if you had serious
> > FP tests to run, you would presumably want to run them on the same instruction set.
> Oh, so we're no longer presuming that you can just let the compiler auto-vectorise and it doesn't matter.

Sigh, more sarcasm. Again, don't know what you're talking about.

I meant exactly what I wrote. If you are doing floating point heavy unit tests in your build system, then you would certainly want to run them on the same instruction set as your target environment. Same microarchitecture even would be preferable, but ISA at least.

It's possible some non-HPC but FP heavy software development will be interested in porting and testing AVX-512 using Phi just as an advance to getting it in mainstream hardware. I don't try to make an *absolute* statement there will be exactly zero. But between the people who would care, the ones who can't get other pre-release CPUs, and the ones who don't get the Phi seeded to encourage software to be ported, my guess is that market size will be a rounding error.

> > > > Not a game console called Playstation Pro Extreme.
> > >
> > > I was mostly joking about that (due to Playstation 3's use
> > > of Cell, and people using them for cheap HPC clusters).
> > >
> > > > Only possible one is amateur rendering videos, but they already have a decent GPU or
> > > > two with their rendering software running on it, so they won't pay for a Xeon Phi.
> > >
> > > Most high-quality rendering for film doesn't use GPU for various reasons (see
> > >
> > >
> > > ).
> >
> > I thought you said amateur video rendering. High end rendering for film is not really what
> > I would call mainstream. Actually if you had to put it in a box, it fits HPC better.
> In that case, maybe there's a massive number of people who have been
> doing "one computer only HPC" for years without even realising it.

From the point of view of workload seen by a single socket, certainly.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Intel's Groveport PlatformMS2017/04/11 01:41 PM
  Intel's Groveport PlatformDaveC2017/04/11 05:47 PM
    Intel's Groveport PlatformMichael S2017/04/12 07:14 AM
  Intel's Groveport PlatformBrendan2017/04/12 05:49 AM
    Intel's Groveport PlatformMS2017/04/12 03:23 PM
      Intel's Groveport PlatformBrendan2017/04/12 07:50 PM
        Intel's Groveport PlatformMichael S2017/04/13 10:51 AM
          Intel's Groveport PlatformBrendan2017/04/14 07:40 AM
            Intel's Groveport PlatformMichael S2017/04/14 08:09 AM
              Intel's Groveport PlatformBrendan2017/04/14 12:23 PM
                Intel's Groveport PlatformMichael S2017/04/17 01:43 PM
                  Intel's Groveport PlatformBrendan2017/04/17 03:13 PM
                    Intel's Groveport PlatformMichael S2017/04/17 04:18 PM
                      Intel's Groveport PlatformBrendan2017/04/17 05:35 PM
                        Intel's Groveport Platformanon2017/04/17 06:32 PM
                          Intel's Groveport PlatformBrendan2017/04/17 08:31 PM
                            Intel's Groveport Platformanon2017/04/17 10:39 PM
                              Intel's Groveport PlatformBrendan2017/04/17 11:48 PM
                                Intel's Groveport PlatformMichael S2017/04/18 01:40 AM
                                Intel's Groveport Platformanon2017/04/18 03:32 AM
                                  Intel's Groveport PlatformBrendan2017/04/19 04:43 PM
                                    Intel's Groveport Platformanon2017/04/21 05:10 PM
                                      Intel's Groveport PlatformBrendan2017/04/24 06:21 AM
                                    Intel's Groveport PlatformJukka Larja2017/04/21 11:09 PM
                            Intel's Groveport PlatformMichael S2017/04/18 01:17 AM
                            Intel's Groveport PlatformMichael S2017/04/18 01:29 AM
                        Intel's Groveport PlatformMaynard Handley2017/04/17 09:40 PM
                          Intel's Groveport PlatformBrendan2017/04/17 10:13 PM
                          snowflakesMichael S2017/04/18 01:06 AM
                            snowflakesAaron Spink2017/04/18 07:21 AM
                              strawberries Daniel B2017/04/20 04:31 AM
Reply to this Topic
Body: No Text
How do you spell green?