Intel's Groveport Platform

By: Brendan (btrotter.delete@this.gmail.com), April 19, 2017 4:43 pm
Room: Moderated Discussions
Hi,

anon (anon.delete@this.anon.com) on April 18, 2017 3:32 am wrote:
> Brendan (btrotter.delete@this.gmail.com) on April 17, 2017 11:48 pm wrote:
> > anon (anon.delete@this.anon.com) on April 17, 2017 10:39 pm wrote:
> > > Brendan (btrotter.delete@this.gmail.com) on April 17, 2017 8:31 pm wrote:
> > > > anon (anon.delete@this.anon.com) on April 17, 2017 6:32 pm wrote:
> > > > > Brendan (btrotter.delete@this.gmail.com) on April 17, 2017 5:35 pm wrote:
> > > > > > Michael S (already5chosen.delete@this.yahoo.com) on April 17, 2017 4:18 pm wrote:
> > > > > > > Brendan (btrotter.delete@this.gmail.com) on April 17, 2017 3:13 pm wrote:
> > > > > > > > > IMHO, it's bloody obvious than if KNL has any chance at all to be competitive against "normal" Xeon
> > > > > > > > > on non-HPC loads then it's *only* when there is a lot of parallel tasks ready all the time. *Much* more
> > > > > > > > > tasks than mere 32 that are needed for full utilization of a pair of hyperthreaded 8-core Xeons.
> > > > > > > >
> > > > > > > > Yes; and it would be extremely foolish to assume that HPC
> > > > > > > > is the only case where there is a lot of parallel tasks.
> > > > > > >
> > > > > > > But HPC, at least some classes of it, is the one of the few tasks, and likely the most important among them
> > > > > > > by far, where KNL's dual 512-bit SIMD units can be advantageous.
> > > > > > > Very-high-bandwidth, but not very low latency,
> > > > > > > HMC-alike memory is also advantageous only for relatively small class of non-HPC workloads.
> > > > > >
> > > > > > Um, what?
> > > > > >
> > > > > > HPC is mostly "same as mainstream, just more of it",
> > > > >
> > > > > It isn't really. Unless you define mainstream so broadly it doesn't mean anything.
> > > >
> > > > How else can "mainstream" be defined?
> > >
> > > Probably wrong wording on my behalf. Doesn't matter how broad or narrow you define mainstream, HPC
> > > is never "mostly same as mainstream, just more of it". You can reasonably define it so regular vectorizable
> > > parallel DFLOPS is a subset of mainstream, but that doesn't make HPC the same as it.
> >
> > If I use one standard Xeon machine to do some number crunching, then that's (part of) mainstream;
> > and if I use 2 standard Xeon machines that's also (part of) mainstream; but if I add a third
> > standard Xeon machine then it becomes HPC and the work each machine is doing suddenly completely
> > and utterly different? Yes, no, maybe? How many machines do I have to add before I hit the magic
> > "suddenly completely and utterly different" cut-off point? Is the answer 42?
>
> Don't really know what you're on about. Sarcastic rhetorical questions are a nasty blight that's
> infected internet forum discussions in recent years. If you can't state your point or disagree
> with mine without asking a handful of snide questions, then maybe it's best left unsaid.
>
> "number crunching" in general is not HPC. HPC is number crunching though. Not
> the most precise term, but usually means almost all useful work being done by
> quite small instruction kernels operating on large floating point data sets.

My point is that, regardless of which metric you use (floating point data set size, parallelism, number of computers, FLOPs, ...), there's a single "computing" scale with no dividing line between non-HPC and HPC.

The only difference between "small instruction kernel operating on floating point data on a cheap smartphone" and "small instruction kernel operating on floating point data on a large super-computer" is scale, not type of work.

You can not say that Xeon Phi is unsuitable for people doing (smaller amounts of) work and is only suitable for people doing larger amounts of the exact same type of work.

> > > > > Not a game developer who wants to support MCDRAM and AVX512 (they will wait at least until
> > > > > it is in high end consumer stuff, if they are serious developers and want to support it
> > > > > before that, Intel will provide engineering samples 6 months or so before release).
> > > >
> > > > High end consumer stuff?
> > >
> > > Yes.
> > >
> > > > High bandwidth on-chip RAM
> > >
> > > Doesn't need specific support beyond what's mostly already there. Some
> > > tuning perhaps, which obviously is not the same from Phi to a PC CPU.
> >
> > Some minor tuning; like completely redesigning memory management to make the most
> > effective use of a new type of limited resource that hasn't existed before.
>
> No, not at all. If the memory comes in the form of transparent fast caches, or faster main memory, no real
> work required. Very fine tuning unlikely but maybe, but not of the type that it would be useful to put effort
> into tuning on Xeon Phi because that won't behave anything like some hypothetical future laptop CPU you don't
> even know about yet. If it's software managed, then there is already precedent for that. If that doesn't fit
> with existing GPU/GPGPU interfaces, then it would be pretty unlikely for a company to "completely redesign
> memory management" because they are making this kind of wild speculations that you are. But if they wanted
> to redesign it to allow for such thing, there is no need to have any real hardware for it, you can just allocate
> some limited memory and pretend it's faster for the purpose of initial design. No need for Phi.

The problem is not that it's faster, the problem is that there are different characteristics for different types of memory in the same system. You can not just use something crippled like "malloc()" (that assumes all RAM is the same) and expect to end up with the most optimal use of different types of memory with different characteristics.

Xeon Phi does have a "use all the MCDRAM as cache" mode, but this is only useful for software that has crippled memory management.


> > > > and AVX512 will probably
> > > > both be in entry level notebooks by the end of the year. It takes years to
> > > > produce a game engine - "6 months sooner" is about 2 years too late.
> > >
> > > It does not take years to port already vectorized codepaths new vector instructions,
> > > or to use updated libraries or compilers. Also games all come out with patches now.
> >
> > I'm talking about fundamental design decisions that effect everything, not some low performance Lua script
> > changes. If I have some C++ code with 123 classes (and various numbers of objects per class); which classes
> > should/shouldn't use high bandwidth RAM? Should some classes be split into "frequently accessed" and "less
> > frequently accessed"? If I'm designing a file format for
> > something (levels, meshes, whatever) how much padding
> > for alignment should I use? Is it better to use 32-byte alignment and let scatter-gather deal with it for
> > AVX512, or should I have unnecessary padding for the AVX2 case? When I create a patch; will the compiler's
> > "auto-vetorization" change the file format used by files that I already shipped on read-only DVDs?
>
> That's just not how development really works. And certainly not how game development
> works. If it's not in today's high end CPU, game development won't bother. Really.
> Certainly not "redesigning the whole thing fundamentally".

I'm not too sure how stupid the average game developer is. For my work there's a "10 years until release" target, and a lot of effort goes into trying to minimise the risk of "obsolete at time of release".

> > > And if they want to use the instructions they can likely already get Xeon engineering samples which would
> > > be much better to work with. I can't remember the lead time that I got engineering samples from Intel back
> > > a few years ago when I was working on an open source project they took an interest in. Probably 3-6 months,
> > > but I was absolute bottom tier. Serious partners and ISVs could get first tapeout samples I'm sure.
> >
> > How would a person get on the list for next generation Xeon Phi (I've heard they'll
> > support virtualisation, and that Intel has early engineering samples already)?
>
> I'm not really involved with them now, but if you were in a position to, you would probably know. Game
> development companies have relationships with CPU and GPU vendors. If you wanted to do some specific
> work on upcoming Intel feature with some significant game or popular application, they would likely
> accommodate. Probably there are formal channels. When I got some engineering samples (of few generations
> of Xeons), I was just working on open source project which some Intel devs also did some work on and
> I think one just asked if I wanted some new CPUs. Or maybe I asked them. Can't remember. Then they
> sent some NDAs and kept sending over Xeons every few years until I asked them to stop.
>
> I'm sure there are some more formal channels to go through as well.

I won't be in that position for many many years, and by the time I am in that position it's going to be too late to matter.

> > > > > Not software developers that want to use Xeon Phi as "single server compile farm".
> > > >
> > > > Many separate processes (with near zero scalability problems between them) with compilers that can
> > > > be memory bandwidth sensitive? Throw in some compile-time calculation for some large arrays and a few
> > > > "floating point heavy" units tests? I wouldn't assume Xeon Phi couldn't be beneficial for some.
> > >
> > > I doubt it really, compared with a bunch of cheap low end Xeons. And if you had serious
> > > FP tests to run, you would presumably want to run them on the same instruction set.
> >
> > Oh, so we're no longer presuming that you can just let the compiler auto-vectorise and it doesn't matter.
>
> Sigh, more sarcasm. Again, don't know what you're talking about.

Previously (for game developer optimisation) you were saying that developers have no reason to care much about AVX512 because compilers and libraries will take care of all the differences (in vectorisable code); and here you're trying to say that (for compile time calculation and unit testing) developers have a reason to care about the differences of AVX512.

> I meant exactly what I wrote. If you are doing floating point heavy unit tests in your build
> system, then you would certainly want to run them on the same instruction set as your target
> environment. Same microarchitecture even would be preferable, but ISA at least.

Xeon Phi processors are standard 80x86 ISA. I doubt many people (excluding compiler developers and CPU manufacturers themselves) write unit tests designed to detect "compiler and/or CPU is buggy".

> It's possible some non-HPC but FP heavy software development will be interested in porting and
> testing AVX-512 using Phi just as an advance to getting it in mainstream hardware. I don't try
> to make an *absolute* statement there will be exactly zero. But between the people who would care,
> the ones who can't get other pre-release CPUs, and the ones who don't get the Phi seeded to encourage
> software to be ported, my guess is that market size will be a rounding error.

For "mainstream uses"; my guess is that the market size for Xeon Phi will always be small because very little software takes advantage of it because software developers won't care because the market size for Xeon Phi will always be small. It's a self-fulfilling feedback loop caused by restricting it to a niche. By pushing Xeon Phi for mainstream uses and giving software developers a reason to care the demand for Phi would increase over time (and help to improve demand for "8+ core Xeon and Core i3/5/7" a little too).

- Brendan
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Intel's Groveport PlatformMS04/11/17 01:41 PM
  Intel's Groveport PlatformDaveC04/11/17 05:47 PM
    Intel's Groveport PlatformMichael S04/12/17 07:14 AM
  Intel's Groveport PlatformBrendan04/12/17 05:49 AM
    Intel's Groveport PlatformMS04/12/17 03:23 PM
      Intel's Groveport PlatformBrendan04/12/17 07:50 PM
        Intel's Groveport PlatformMichael S04/13/17 10:51 AM
          Intel's Groveport PlatformBrendan04/14/17 07:40 AM
            Intel's Groveport PlatformMichael S04/14/17 08:09 AM
              Intel's Groveport PlatformBrendan04/14/17 12:23 PM
                Intel's Groveport PlatformMichael S04/17/17 01:43 PM
                  Intel's Groveport PlatformBrendan04/17/17 03:13 PM
                    Intel's Groveport PlatformMichael S04/17/17 04:18 PM
                      Intel's Groveport PlatformBrendan04/17/17 05:35 PM
                        Intel's Groveport Platformanon04/17/17 06:32 PM
                          Intel's Groveport PlatformBrendan04/17/17 08:31 PM
                            Intel's Groveport Platformanon04/17/17 10:39 PM
                              Intel's Groveport PlatformBrendan04/17/17 11:48 PM
                                Intel's Groveport PlatformMichael S04/18/17 01:40 AM
                                Intel's Groveport Platformanon04/18/17 03:32 AM
                                  Intel's Groveport PlatformBrendan04/19/17 04:43 PM
                                    Intel's Groveport Platformanon04/21/17 05:10 PM
                                    Intel's Groveport PlatformJukka Larja04/21/17 11:09 PM
                            Intel's Groveport PlatformMichael S04/18/17 01:17 AM
                            Intel's Groveport PlatformMichael S04/18/17 01:29 AM
                        Intel's Groveport PlatformMaynard Handley04/17/17 09:40 PM
                          Intel's Groveport PlatformBrendan04/17/17 10:13 PM
                          snowflakesMichael S04/18/17 01:06 AM
                            snowflakesAaron Spink04/18/17 07:21 AM
                              strawberries Daniel B04/20/17 04:31 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?