Knights Landing CPU Speculation

Article: Knights Landing CPU Speculation
By: Patrick Chase (, November 25, 2013 1:22 pm
Room: Moderated Discussions
Linus Torvalds ( on November 25, 2013 12:29 pm wrote:
> Patrick Chase ( on November 23, 2013 4:33 pm wrote:
> >
> > Most real HPC systems have dedicated I/O nodes for that sort of thing, whether it be
> > the Xeon/Opteron in "Xeon/Opteron + Tesla/Phi", or dedicated cores in BG/Q systems.
> > That isn't really a reason to constrain the architecture of the *compute* nodes.
> I agree that that is true for most of the people doing accelerators.
> But the whole point of Xeon Phi is that the compute node is a "normal CPU", so that
> it's easier to write software for (and use older software with minimal changes). That's
> very much what differentiates it from the systems that try to use GPU's etc.

I understand what you're saying, but at the same time I'm familiar with "real-world" HPC development practices, and they invariably segregate most or all I/O from compute. For example:

- Almost all modern supercomputers have dedicated I/O nodes for system-level I/O such as the filesystems that you referenced in your initial post. Filesystem requirements have absolutely no bearing on compute node configuration, never mind compute core microarchitecture.

- IBM BG/Q has uniform core microarchitecture (17 PPC A2 cores per node) within a node, and yet reserves a dedicated core for node-level I/O such as MPI. Each BG/Q node uses 16 cores for compute and 1 for everything else.

- Even in systems that don't explicitly reserve a core for node-level housekeeping it's VERY common to do so at the application level (i.e. create a dedicated thread for MPI, and pin it to a specific dedicated core). I've done this on older IBM supercomputers that had "uniform" node architectures, and I know that people do it all the time on Opteron-based Crays. That's probably why IBM architected BG/Q the way they did...

If your use model is as I described above, then there's really no reason to constrain your compute core architecture based on I/O. If the workload dedicates cores to specific tasks then there's no reason why those cores should be fungible.

> So you're pretty much expected to run a real OS on those nodes. You don't have to
> do so, of course, but it does seem to be one of the main usage models.

Yes, but...

If you're going to pin housekeeping and compute to specific, dedicated cores anyway, then the OS will never be called upon to make scheduling decisions that cross core boundaries. In that case there is no harm from having cores with different microarchitectures within a single OS instance so long as they're cache-coherent and implement a common ISA, correct? If so then there would be nothing that would prevent IBM from replacing the 17th PPC A2 in BG/Q with, say, a POWER8 core if they needed to do so to get acceptable I/O performance.

> And I think that's the argument Intel makes for
> it - not only are the compute units regular full CPU's, they are x86 CPU's, so
> people are expected to have an easy time migrating from some previous cluster-of-pc
> setup.

That is indeed Intel's argument as far as I can tell. I just don't buy it based on what I know about their target market. I think that they want to get off of the P54c for obvious reasons (it's a VERY old design), and the more modern cores that are available happen to have better single-thread performance, so they're going to market that for all it's worth.

> Sure, you'll want to recompile and do some extra work to really take advantage of the
> wider vectors, but it's still a much smaller and more incremental step than moving to
> OpenCL and special nodes for feeding the compute units.

This is a straw-man IMO. There's a pretty large solution space between "all cores identical and capable of hosting I/O" and "you have to use OpenCL".

> And Intel really does seem to be pushing this angle, talking about how KNL
> is a standalone CPU, not some add-in accelerator card. See for example
> or straight from Intel PR:

It's no different from BG/Q in that respect. See remarks above.

> and if you go this approach (which would seem to have real advantages), you definitely
> want to have "good enough" performance on single thread loads, because you're
> not having something else feed the data to you by hand any more.

IBM seems to think otherwise. PPC A2 is basically the "Atom of the PowerPC world" (dual-issue, in-order) and yet it does perfectly fine for node-level I/O in BG/Q. As noted above the "heavy" system-level I/O is handled by dedicated nodes so that's out of the discussion.

> And the old Atom really was pretty bad at some general-purpose stuff. That VR-zone link
> says KNL is 72 modified Silvermont cores, so it should be much better in that regard.

Yes, but I just don't think that's particularly beneficial for HPC.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Knights Landing CPU SpeculationDavid Kanter2013/11/18 03:03 AM
  Knights Landing CPU Speculationnone2013/11/18 03:59 AM
    Knights Landing CPU SpeculationPatrick Chase2013/11/23 04:18 PM
      Knights Landing CPU Speculation2013/11/26 02:20 AM
        Over 2,000 mm^2 of eDRAM???Mark Roulo2013/11/26 10:28 AM
          Over 2,000 mm^2 of eDRAM???David Kanter2013/11/26 12:09 PM
          Over 2,000 mm^2 of eDRAM???Eric Bron2013/11/26 12:21 PM
            Over 2,000 mm^2 of eDRAM???tarlinian2013/11/26 12:50 PM
              Over 2,000 mm^2 of eDRAM???Eric Bron2013/11/26 02:07 PM
                Over 2,000 mm^2 of eDRAM???Eric Bron2013/11/26 02:09 PM
                Over 2,000 mm^2 of eDRAM???aaron spink2013/11/26 04:03 PM
                  Over 2,000 mm^2 of eDRAM???Eric Bron2013/11/27 12:42 AM
                    Over 2,000 mm^2 of eDRAM???aaron spink2013/11/27 11:31 AM
              Over 2,000 mm^2 of eDRAM???David Kanter2013/11/26 05:25 PM
                Over 2,000 mm^2 of eDRAM???tarlinian2013/11/26 08:01 PM
          Over 2,000 mm^2 of eDRAM???Eric2013/11/27 03:54 AM
            eDRAM is DRAM in a logic-oriented processPaul A. Clayton2013/11/27 08:10 AM
  Knights Landing CPU SpeculationJames2013/11/18 06:26 AM
    Knights Landing CPU SpeculationMichael S2013/11/18 03:57 PM
      Knights Landing CPU SpeculationUrban Novak2013/11/19 01:49 AM
        Knights Landing CPU Speculationnone2013/11/19 02:19 AM
          Knights Landing CPU SpeculationEric2013/11/19 08:48 PM
            Total GPGPU/Xeon Phi market maybe ~ $500M/year ...Mark Roulo2013/11/20 11:35 AM
      Knights Landing CPU SpeculationWes Felter2013/11/19 01:06 PM
        Knights Landing CPU SpeculationMichael S2013/11/19 01:49 PM
  Knights Landing CPU SpeculationEric2013/11/18 01:17 PM
    Knights Landing CPU SpeculationDaniel2013/11/19 03:28 AM
      Knights Landing CPU SpeculationEric2013/11/19 08:36 PM
      HPC guys score FLOPS non-obviouslyMark Roulo2013/11/20 11:43 AM
        3-TFlops-DGEMMMichael S2013/11/20 11:59 AM
          3-TFlops-DGEMMMark Roulo2013/11/20 01:22 PM
            3-TFlops-DGEMMDaniel2013/11/20 02:04 PM
              3-TFlops-DGEMMEric2013/11/21 02:28 AM
                3-TFlops-DGEMMMichael S2013/11/21 06:48 AM
                  3-TFlops-DGEMMRecessionCone2013/11/21 12:13 PM
                    3-TFlops-DGEMMMichael S2013/11/21 03:34 PM
                  3-TFlops-DGEMMEric2013/11/22 03:10 AM
                    3-TFlops-DGEMMMichael S2013/11/22 05:41 AM
                    A (not very sensible) alternative: FMADD + FADDPaul A. Clayton2013/11/22 09:19 AM
                  3-TFlops-DGEMMSylvain Collange2013/11/24 03:37 AM
                    3-TFlops-DGEMMMichael S2013/11/24 07:06 AM
                      3-TFlops-DGEMMSylvain Collange2013/11/24 10:28 AM
        HPC guys score FLOPS non-obviouslyPatrick Chase2013/11/23 03:58 PM
  Knights Landing CPU SpeculationPaul Caheny2013/11/18 02:25 PM
    Knights Landing CPU SpeculationKonrad Schwarz2013/11/19 01:24 AM
  Knights Landing CPU SpeculationAmiba Gelos2013/11/19 08:36 PM
    Knights Landing CPU SpeculationDavid Kanter2013/11/20 10:52 AM
    Knights Landing CPU SpeculationLinus Torvalds2013/11/21 03:12 PM
      Knights Landing CPU SpeculationAmiba Gelos2013/11/21 06:14 PM
      Knights Landing CPU SpeculationPatrick Chase2013/11/23 04:33 PM
        Knights Landing CPU SpeculationLinus Torvalds2013/11/25 12:29 PM
          Knights Landing CPU SpeculationLinus Torvalds2013/11/25 01:05 PM
          Knights Landing CPU SpeculationPatrick Chase2013/11/25 01:22 PM
            Knights Landing CPU SpeculationLinus Torvalds2013/11/26 11:11 AM
          Knights Landing CPU SpeculationEric2013/11/26 04:05 AM
            Knights Landing CPU SpeculationEric2013/11/26 04:15 AM
            Knights Landing CPU Speculationnone2013/11/26 04:33 AM
              Knights Landing CPU SpeculationEric2013/11/26 07:30 PM
                Knights Landing CPU SpeculationEric2013/11/26 07:34 PM
                  What is MCDRAM?anon2013/11/26 09:58 PM
                    What is MCDRAM?none2013/11/27 02:00 AM
                      What is MCDRAM?Klimax2013/11/27 03:19 AM
                  Knights Landing CPU SpeculationKlimax2013/11/27 12:06 AM
                Knights Landing CPU SpeculationKlimax2013/11/27 12:05 AM
            Knights Landing CPU Speculationanon2013/11/26 06:53 AM
              Knights Landing CPU Speculationnone2013/11/26 07:20 AM
                Knights Landing CPU SpeculationMichael S2013/11/26 09:06 AM
                  Knights Landing CPU Speculationnone2013/11/26 10:18 AM
                    Knights Landing CPU SpeculationEric Bron2013/11/26 02:21 PM
                      Knights Landing CPU SpeculationEric Bron2013/11/26 02:27 PM
                        Knights Landing CPU Speculationnone2013/11/26 03:26 PM
                    Knights Landing CPU Speculationanon2013/11/26 06:42 PM
                      Knights Landing CPU Speculationnone2013/11/27 02:08 AM
                        Knights Landing CPU Speculationanon2013/11/27 02:50 AM
                          Knights Landing CPU Speculationnone2013/11/27 02:58 AM
                      Knights Landing CPU SpeculationMichael S2013/11/27 02:25 AM
                        Knights Landing CPU Speculationanon2013/11/27 03:32 AM
                          Knights Landing CPU SpeculationMichael S2013/11/27 04:08 AM
    Knights Landing CPU SpeculationChung Leong2013/11/27 02:28 AM
      Knights Landing CPU SpeculationMichael S2013/11/27 03:53 AM
        Knights Landing CPU SpeculationChung Leong2013/11/27 02:03 PM
  BiG.LiTTLe for KNL?Jeff K2013/11/22 07:17 AM
    BiG.LiTTLe for KNL?Patrick Chase2013/11/23 03:54 PM
      BiG.LiTTLe for KNL?Patrick Chase2013/11/23 04:01 PM
  Transactional memoryPatrick Chase2013/11/23 03:37 PM
    Transactional memoryBhima2013/11/25 08:01 AM
      Transactional memoryPatrick Chase2013/11/25 12:52 PM
  Knights Landing CPU SpeculationDaniel2013/11/25 03:17 AM
    Knights Landing CPU SpeculationKlimax2013/11/25 04:12 AM
    Knights Landing CPU Speculationnone2013/11/25 05:05 AM
      Knights Landing CPU SpeculationKlimax2013/11/25 05:45 AM
        Knights Landing CPU Speculationnone2013/11/25 05:55 AM
          Knights Landing CPU Speculationgmb2013/11/25 08:21 AM
Reply to this Topic
Body: No Text
How do you spell avocado?