Coherency for discretes

Article: Computational Efficiency for CPUs and GPUs in 2012
By: Rohit (a.delete@this.b.c), July 29, 2012 9:24 am
Room: Moderated Discussions
aaron spink (aaronspink.delete@this.notearthlink.net) on July 29, 2012 5:39 am wrote:
> Depends. If you can convince PCI SIG to
> implement a coherent protocol the difficulty shouldn't be that high. Efficiency
> wouldn't be the greatest since the basic protocols for PCI-E are designed around
> large block transfers but it would be doable esp with the move to integrating
> PCI-E on die.
>
> Probably the easiest solution for something like MIC would be
> to integrate a QPI interface in addition to the PCI-E interface. At that point
> it primarily becomes an exercise in setting up the memory map reasonably/sanely.
> The MIC would need both a caching agent and a home coherency agent, but it
> should be possible to do some cut and paste.
>
> What would be more difficult in
> the QPI + PCI-E space is get maximum advantage out of it. You would ideally
> like to use both the QPI agent and the PCI-E agent for bulk DMA traffic while
> only using the QPI agent for coherent traffic. Using the QPI link for bulk DMA
> would likely take some work with the various DMA engines. For the MIC local
> coherent memory you would likely only make a subset of it available for coherent
> access from the CPU in order to simplify the performance requirements (likely a
> variable window size that is programmable) so that all memory accesses from the
> MIC to its local memory don't have to remote snoop though if you have the area
> available, you might be able to get away with an SRAM based directory (basically
> limited capacity coherency from the MIC to CPU aka evict to make space) as well.
>
>
> For the CPU memory, you would likely be unrestricted assuming that the CPU
> side used some form of directory.
>
> Total DMA bandwidth should be at least
> equal to 32x PCI-E. And for coherent access you are looking at a minimum of 16x
> PCI-E bandwidth.
>
> And from a practical standpoint you are going to want 2xQPI
> or QPI+PCI-E since it is unlikely that the market requirement will be there for
> the CPUs to have 3x 16x PCI-E. Though if your network interface chip runs over
> a single QPI link it might be viable, but I kinda see the ideal setup for a top
> end super as 1 QPI + 16x PCI-E to both the MIC/GPU and to the network interface.
> So you would be looking at ~32+GB/s (at current speeds, likely 64+ GB/s in the
> 2015 timeframe baed on PCI-E 4.0 announced goals) in and out of the CPU to both
> network and MIC for a total of 128 GB/s which means that memory bandwidth likely
> becomes you main bottleneck.
>
> Also lets not forget that by the time this
> happens we are likely going to see some form of stacked memory in reasonably
> wide use, which means that the CPUs will likely have 1-4 GB of ultra high
> bandwidth "cache". Which if used right would provide enough bandwidth buffer to
> have the I/Os plus having the option to direct route to/from networkMIC would
> make the 102.4 GB/s CPU memory subsystem reasonable.
>
> The next big problem is
> going to be feeding the network bandwidth. With that type of IO capability you
> would need 9+ 4x FDR IB connections per node. And you're probably going to want
> a switchless topology, can that would be a lot of switches.

Why bother including PCIe in the mix? The way I see it, Nvidia would just make a 500mm2 chip with a few ARM cores and put four(8?) of those on a motherboard, coupled with their own coherency protocol. Blow a few fuses and sell it to PC gamers anyway. AMD has dropped it's plans for extending coherency to discretes (it wasn't there in 2012 AFDS roadmaps, removed after being presented to public for about 2 years). Which leaves Intel doing it's own thing.

They can just put one Xeon on a socket and put 3 MICs on other 3 sockets, intra-corporate politics permitting.

The notion of CPU vs GPU is on it's last legs anyway. Pretty soon, you would be hard pressed to find something (that is meant for HPC market that is, not talking about datacenters and stuff like that) that doesn't have a bit of both. AFAICS, the chart in this article already includes extrapolated data from IVB almost-an-SOC.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
New Article: Compute Efficiency 2012David Kanter2012/07/25 01:37 AM
  New Article: Compute Efficiency 2012SHK2012/07/25 02:31 AM
    New Article: Compute Efficiency 2012David Kanter2012/07/25 02:42 AM
  New Article: Compute Efficiency 2012none2012/07/25 03:18 AM
    New Article: Compute Efficiency 2012David Kanter2012/07/25 11:25 AM
  GCN (NT)EBFE2012/07/25 03:25 AM
    GCN - TFLOP DPjp2012/08/09 01:58 PM
      GCN - TFLOP DPDavid Kanter2012/08/09 03:32 PM
        GCN - TFLOP DPKevin G2012/08/11 05:22 PM
      GCN - TFLOP DPEric2012/08/09 05:12 PM
        GCN - TFLOP DPjp2012/08/10 01:23 AM
          GCN - TFLOP DPEBFE2012/08/12 08:27 PM
            GCN - TFLOP DPjp2012/08/13 02:02 AM
              GCN - TFLOP DPEBFE2012/08/13 07:45 PM
                GCN - TFLOP DPjp2012/08/14 01:21 AM
  New Article: Compute Efficiency 2012Adrian2012/07/25 04:39 AM
    New Article: Compute Efficiency 2012EBFE2012/07/25 09:33 AM
    New Article: Compute Efficiency 2012David Kanter2012/07/25 11:11 AM
  New Article: Compute Efficiency 2012sf2012/07/25 06:46 AM
    New Article: Compute Efficiency 2012aaron spink2012/07/25 09:08 AM
      New Article: Compute Efficiency 2012someone2012/07/25 10:06 AM
    New Article: Compute Efficiency 2012David Kanter2012/07/25 11:14 AM
      New Article: Compute Efficiency 2012EBFE2012/07/26 02:27 AM
        BG/QDavid Kanter2012/07/26 09:31 AM
          VR-ZONE KNC B0 leak, poor number?EBFE2012/08/03 01:57 AM
            VR-ZONE KNC B0 leak, poor number?Eric2012/08/03 07:59 AM
              VR-ZONE KNC B0 leak, poor number?EBFE2012/08/04 06:37 AM
                VR-ZONE KNC B0 leak, poor number?aaron spink2012/08/04 06:51 PM
                Leaks != productsDavid Kanter2012/08/05 03:19 AM
                  Leaks != productsEBFE2012/08/06 02:49 AM
                VR-ZONE KNC B0 leak, poor number?Eric2012/08/05 10:37 AM
                  VR-ZONE KNC B0 leak, poor number?EBFE2012/08/06 03:09 AM
                    VR-ZONE KNC B0 leak, poor number?aaron spink2012/08/06 04:33 AM
                      VR-ZONE KNC B0 leak, poor number?jp2012/08/07 03:08 AM
                        VR-ZONE KNC B0 leak, poor number?Eric2012/08/07 04:58 AM
                          VR-ZONE KNC B0 leak, poor number?jp2012/08/07 05:17 AM
                            VR-ZONE KNC B0 leak, poor number?Eric2012/08/07 05:22 AM
                              VR-ZONE KNC B0 leak, poor number?anonymou52012/08/07 09:43 AM
                            VR-ZONE KNC B0 leak, poor number?jp2012/08/07 05:23 AM
                              VR-ZONE KNC B0 leak, poor number?aaron spink2012/08/07 07:24 AM
                        VR-ZONE KNC B0 leak, poor number?aaron spink2012/08/07 07:20 AM
                          VR-ZONE KNC B0 leak, poor number?jp2012/08/07 11:22 AM
                            VR-ZONE KNC B0 leak, poor number?EduardoS2012/08/07 03:15 PM
                        KNC has FMADavid Kanter2012/08/07 09:17 AM
  New Article: Compute Efficiency 2012forestlaughing2012/07/25 08:51 AM
    New Article: Compute Efficiency 2012Eric2012/07/27 05:12 AM
      New Article: Compute Efficiency 2012hobold2012/07/27 11:53 AM
        New Article: Compute Efficiency 2012Eric2012/07/27 12:51 PM
          New Article: Compute Efficiency 2012hobold2012/07/27 02:48 PM
            New Article: Compute Efficiency 2012Eric2012/07/27 03:29 PM
        New Article: Compute Efficiency 2012anon2012/07/29 02:25 AM
          New Article: Compute Efficiency 2012hobold2012/07/29 11:53 AM
  Efficiency? No, lack of highly useful featuressomeone2012/07/25 09:58 AM
    Best case for GPUsDavid Kanter2012/07/25 11:28 AM
      Best case for GPUsfranzliszt2012/07/25 01:39 PM
      Best case for GPUsChuck2012/07/25 08:13 PM
        Best case for GPUsDavid Kanter2012/07/25 09:45 PM
        Best case for GPUsEric2012/07/27 05:51 AM
  Silverthorn data pointMichael S2012/07/25 02:45 PM
    Silverthorn data pointDavid Kanter2012/07/25 04:06 PM
  New Article: Compute Efficiency 2012Unununium2012/07/25 05:55 PM
    New Article: Compute Efficiency 2012EduardoS2012/07/25 08:12 PM
      Ops... I'm wrong...EduardoS2012/07/25 08:14 PM
  New Article: Compute Efficiency 2012TacoBell2012/07/25 08:36 PM
    New Article: Compute Efficiency 2012David Kanter2012/07/25 09:49 PM
    New Article: Compute Efficiency 2012Michael S2012/07/26 02:33 AM
  Line and factorMoritz2012/07/26 01:34 AM
    Line and factorPeter Boyle2012/07/27 07:57 AM
      not entirelyMoritz2012/07/27 12:22 PM
      Line and factorEduardoS2012/07/27 05:24 PM
        Line and factorMoritz2012/07/28 12:52 PM
  tables Michael S2012/07/26 02:39 AM
  Interlagos L2+L3Rana2012/07/26 03:13 AM
    Interlagos L2+L3Rana2012/07/26 03:13 AM
    Interlagos L2+L3David Kanter2012/07/26 09:21 AM
      SP vs DP & performance metricsjp2012/07/27 07:08 AM
        SP vs DP & performance metricsEric2012/07/27 07:57 AM
          SP vs DP & performance metricsjp2012/07/27 09:18 AM
            SP vs DP & performance metricsaaron spink2012/07/27 09:36 AM
              SP vs DP & performance metricsjp2012/07/27 09:47 AM
                "Global" --> systemPaul A. Clayton2012/07/27 10:31 AM
                  "Global" --> systemjp2012/07/27 03:55 PM
                    "Global" --> systemaaron spink2012/07/27 07:33 PM
                      "Global" --> systemjp2012/07/28 02:00 AM
                        "Global" --> systemaaron spink2012/07/28 06:54 AM
                          "Global" --> systemjp2012/07/29 02:12 AM
                            "Global" --> systemaaron spink2012/07/29 05:03 AM
                              "Global" --> systemnone2012/07/29 09:05 AM
                                "Global" --> systemEduardoS2012/07/29 10:26 AM
                                "Global" --> systemjp2012/07/30 02:24 AM
                                  "Global" --> systemaaron spink2012/07/30 03:05 AM
                                "Global" --> systemaaron spink2012/07/30 03:03 AM
                                  daxpy is STREAM TRIADPaul A. Clayton2012/07/30 06:10 AM
                SP vs DP & performance metricsaaron spink2012/07/27 07:25 PM
                  SP vs DP & performance metricsEmil Briggs2012/07/28 06:40 AM
                    SP vs DP & performance metricsaaron spink2012/07/28 07:05 AM
                      SP vs DP & performance metricsjp2012/07/28 11:04 AM
                        SP vs DP & performance metricsBrett2012/07/28 03:32 PM
                      SP vs DP & performance metricsEmil Briggs2012/07/28 06:11 PM
                        SP vs DP & performance metricsanon2012/07/29 02:53 AM
                        SP vs DP & performance metricsaaron spink2012/07/29 05:39 AM
                          Coherency for discretesRohit2012/07/29 09:24 AM
                          SP vs DP & performance metricsanon2012/07/29 11:09 AM
                          SP vs DP & performance metricsEric2012/07/29 01:08 PM
        SP vs DP & performance metricsaaron spink2012/07/27 09:25 AM
  Regular updates?Joe2012/07/27 09:35 AM
  New Article: Compute Efficiency 20123092012/07/27 10:34 PM
  New Article: Compute Efficiency 2012Ingeneer2012/07/30 09:01 AM
    New Article: Compute Efficiency 2012David Kanter2012/07/30 01:11 PM
      New Article: Compute Efficiency 2012Ingeneer2012/07/30 08:04 PM
        New Article: Compute Efficiency 2012David Kanter2012/07/30 09:32 PM
          Memory power and bandwidth?Iain McClatchie2012/08/03 04:35 PM
            Memory power and bandwidth?David Kanter2012/08/04 11:22 AM
              Memory power and bandwidth?Michael S2012/08/04 02:36 PM
              Memory power and bandwidth?Iain McClatchie2012/08/06 02:09 PM
              Memory power and bandwidth?Eric2012/08/07 06:28 PM
                WorkloadsDavid Kanter2012/08/08 10:49 AM
                  WorkloadsEric2012/08/09 05:21 PM
                Latency and bandwidth bottlenecks Paul A. Clayton2012/08/08 04:02 PM
                  Latency and bandwidth bottlenecks Eric2012/08/09 05:32 PM
                    Latency and bandwidth bottlenecks none2012/08/10 06:06 AM
                  Latency and bandwidth bottlenecks -> BDPajensen2012/08/11 03:21 PM
            Memory power and bandwidth?Ingeneer2012/08/06 11:26 AM
  NV aims for 1.8+ TFLOPS DP ?jp2012/08/11 01:21 PM
    NV aims for 1.8+ TFLOPS DP ?David Kanter2012/08/11 09:25 PM
      NV aims for 1.8+ TFLOPS DP ?jp2012/08/12 02:45 AM
      NV aims for 1.8+ TFLOPS DP ?EBFE2012/08/12 10:02 PM
        NV aims for 1.8+ TFLOPS DP ?jp2012/08/13 01:54 AM
          NV aims for 1.8+ TFLOPS DP ?Gabriele Svelto2012/08/13 09:16 AM
            NV aims for 1.8+ TFLOPS DP ?Vincent Diepeveen2012/08/14 03:04 AM
          NV aims for 1.8+ TFLOPS DP ?David Kanter2012/08/13 09:50 AM
            NV aims for 1.8+ TFLOPS DP ?jp2012/08/13 11:17 AM
        NV aims for 1.8+ TFLOPS DP ?EduardoS2012/08/13 06:45 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?