SPEC Memory traffic & bandwidth

By: Andrei F (andrei.delete@this.anandtech.com), September 21, 2020 7:36 am
Room: Moderated Discussions
Andrei F (andrei.delete@this.anandtech.com) on September 21, 2020 8:35 am wrote:
> Andrei F (andrei.delete@this.anandtech.com) on September 21, 2020 6:50 am wrote:
> > Michael S (already5chosen.delete@this.yahoo.com) on September 21, 2020 1:38 am wrote:
> > > Travis Downs (travis.downs.delete@this.gmail.com) on September 20, 2020 5:34 pm wrote:
> > > > Michael S (already5chosen.delete@this.yahoo.com) on September 20, 2020 10:02 am wrote:
> > > > > Travis Downs (travis.downs.delete@this.gmail.com) on September 19, 2020 8:26 pm wrote:
> > > > > > Andrei F (andrei.delete@this.anandtech.com) on September 18, 2020 1:04 am wrote:
> > > > > > > anon (anon.delete@this.anon.com) on September 17, 2020 7:10 pm wrote:
> > > > > > > > AnandTech's (SPEC ST performance) review is here: anandtech.com/show/16084/intel-tiger-lake-review-deep-dive-core-11th-gen/8
> > > > > > > > However not all is good: TigerLake
> > > > > > > > experiences a noticeable IPC regression compared to IceLake. The memory subsystem is unable
> > > > > > > > to keep up with the higher clocks, and the reworked cache is not enough.
> > > > > > > >
> > > > > > >
> > > > > > > I just want to add on that sentence as that's not what I wrote
> > > > > > > in the piece: I don't think the memory subsystem is to blame.
> > > > > > >
> > > > > > > It's significantly stronger than ICL and showcases *much* better DRAM latency and significant
> > > > > > > single core bandwidth uplift. 429.mcf showcases great scaling well beyond clocks, showing
> > > > > > > that latency for example is not to blame. In my opinion it's a regression *because* of the
> > > > > > > reworked cache, as essentially the L3 is now 20% slower per clock versus ICL.
> > > > > >
> > > > > > You mean L3 latency, right? It might be a part of it, but the regression in libquantum
> > > > > > and lbm are too large to be explained by this few cycle change, I think. You'd pretty much
> > > > > > have to write a dedicated L3 latency test to get that big of a drop and IIRC neither of
> > > > > > those are known to be very dependent on L3 latency (they are more bandwidth heavy).
> > > > > >
> > > > > > So I think there's something else more interesting going on there.
> > > > > >
> > > > > >
> > > > >
> > > > > TGL uncore appears to be inspired SKX, except, hopefully, better latency of LLC misses under light load.
> > > > > So, may be, it suffers from similarly low single-core bandwidth?
> > > > >
> > > >
> > > > Well Andrei has some detailed bandwidth benchmarks on this page and performance looks
> > > > better across the board: there's actually a significant bump in L3 and RAM regions.
> > > >
> > >
> > > Yes, it's better than ICL.
> > > But probably quite a lot worse than desktop SKL. Out of
> > > memory ( :-) ), my E-2176G achieves 33-35 GB/s on long
> > > sequential reads, supposedly similar to Andrei's Vec128 LD test.
> > > If I am not mistaken, even i7-6920HQ with DDR4-2133
> > > that I was playing with couple of years ago, was capable to
> > > do 30 GB/s. From raw bandwidth perspective LPDDR4X-4266
> > > in TGL rig should be equal to DDR4-2133, right? But the end result is somehow 1.5x lower.
> > > I have no idea what "flip" tests do, so can't compare.
> > >
> > > > So I feel like it has to be something more complicated than just worse peak BW: maybe a different
> > > > way of splitting power between core, uncore and memory? Paul Alcorn from Tomshardware suggested
> > > > that memory frequency itself can be varied on this part, not sure if that's correct. I don't
> > > > think any previous Intel part had frequency scaling for the memory bus?
> > > >
> > >
> > >
> >
> > The flip test is a memory copy test that sits inside a fixed memory region, moving cachelines from one
> > end to the other end, essentially flipping the memory region around on a cacheline block basis.
> >
> > It's basically the same bandwidth as a traditional memory copy just different locality in virtual memory.
> >
> > ---
> >
> > I did some more characterisations via counters on a 9900K to see where the stress-points
> > are. Essentially the Willow Cove improvements regressions follow this formula:
> >
> > - If the workload has a high HPKI of loads and store in
> > the L3, but a low MKPI, then the workload sees a large
> > performance improvement due to the much bigger L2 cache, due to it previously having a very high miss %.
> >
> > xalanc and astar follow this behaviour, with high L3 hits but very high L2 misses.
> >
> > - If the workload has both a high HPKI and MPKI for L3 loads and stores and there's a large %
> > of misses versus hits, then these workloads correspond to the biggest losers for Willow Cove.
> >
> > https://pbs.twimg.com/media/EiIBUUHWsAMH5Dl?format=png&name=orig
> >
> > This is essentially all the red workloads.
> >
> > - The only exception to the above seem to be workloads that are primarily DRAM latency
> > limited and have extremely high memory stall cycles. MCF and omnetpp correspond to
> > this characterisation and on my 9900K have 55.3% and 61.1% stall cycles.
> >
> > These workloads seem to have very low MLP and are more pointer-chaser like, and here Tiger
> > Lake's much better DRAM latency is counteracting any slowdowns on the part of the L3.
>
> If anyone's interested:
>
> https://i.imgur.com/nS0pyLZ.png
>
> https://i.imgur.com/nS0pyLZ.png
>
> Of course these are just totals and averages and do not showcase periodic bursts or bottlenecks.

Duplicate image, sorry:

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Tiger Lake performance profileanon2020/09/17 06:10 PM
  Tiger Lake performance profileClipping Coupons2020/09/17 07:22 PM
  Tiger Lake performance profileDoug S2020/09/17 09:36 PM
    Tiger Lake performance profileJose2020/09/18 12:24 AM
      Tiger Lake performance profileAndrei F2020/09/18 02:26 AM
        Tiger Lake performance profileitsmydamnation2020/09/18 02:19 PM
          Tiger Lake performance profileMaynard Handley2020/09/18 04:00 PM
            Tiger Lake performance profileAndrei F2020/09/19 07:29 AM
              Tiger Lake performance profileMaynard Handley2020/09/19 09:34 AM
                Tiger Lake performance profileAndrei F2020/09/19 09:43 AM
                  Tiger Lake performance profileanon2020/09/19 10:08 AM
                    Tiger Lake performance profileAndrei Frumusanu2020/09/19 10:52 AM
                      Tiger Lake performance profileanon2020/09/19 11:50 AM
                        Tiger Lake performance profileAndrei F2020/09/19 12:27 PM
          Tiger Lake performance profile-.-2020/09/19 03:31 PM
        Tiger Lake performance profileJose2020/09/19 01:40 AM
          Tiger Lake performance profileAndrei F2020/09/19 07:25 AM
            Tiger Lake performance profileJose2020/09/23 12:27 AM
    Tiger Lake performance profilejuanrga2020/09/18 01:38 AM
      Tiger Lake performance profileDoug S2020/09/18 08:25 AM
  Tiger Lake performance profileAndrei F2020/09/18 12:04 AM
    Tiger Lake performance profileAnon2020/09/18 02:25 AM
      Tiger Lake performance profileAndrei F2020/09/18 02:31 AM
    Tiger Lake performance profileTravis Downs2020/09/19 07:26 PM
      Tiger Lake performance profileMichael S2020/09/20 09:02 AM
        Tiger Lake performance profileTravis Downs2020/09/20 04:34 PM
          Tiger Lake performance profileMichael S2020/09/21 12:38 AM
            Tiger Lake performance profileAndrei F2020/09/21 05:50 AM
              MKPI ? MPKI ? HPKI ? (NT)Michael S2020/09/21 06:03 AM
                MKPI ? MPKI ? HPKI ?Anon2020/09/21 06:22 AM
                  thank you (NT)Michael S2020/09/21 06:42 AM
                  MKPI ? MPKI ? HPKI ?none2020/09/22 12:12 AM
              SPEC Memory traffic & bandwidthAndrei F2020/09/21 07:35 AM
                SPEC Memory traffic & bandwidthAndrei F2020/09/21 07:36 AM
                  SPEC Memory traffic & bandwidthDavid Kanter2020/09/21 01:31 PM
                What is the meaning of multiple rows in few subtests? (NT)Michael S2020/09/21 07:45 AM
                  What is the meaning of multiple rows in few subtests?Andrei F2020/09/21 07:57 AM
            Poor L1D load bandwidthEric Bron2020/09/21 05:56 AM
              erratumEric Bron2020/09/21 05:59 AM
              Sorry I missread the graphEric Bron2020/09/21 06:14 AM
              Poor main memory load bandwidthMichael S2020/09/21 06:19 AM
            Tiger Lake performance profileTravis Downs2020/09/21 02:51 PM
              Tiger Lake performance profileAndrei F2020/09/22 06:03 AM
    Tiger Lake security fixes possible cause?Kevin G2020/09/22 05:10 AM
      Tiger Lake security fixes possible cause?Travis Downs2020/09/22 06:26 AM
  SuperiorityMichael S2020/09/18 01:58 AM
    SuperiorityAndrei F2020/09/18 02:39 AM
      SuperiorityRobert Müller2020/09/18 02:59 AM
        SuperiorityAndrei F2020/09/18 03:47 AM
          SuperiorityRobert Müller2020/09/18 04:45 AM
            SuperiorityAndrei F2020/09/18 05:17 AM
              SuperiorityTravis Downs2020/09/18 06:21 AM
          Superiorityanon2020/09/18 11:34 AM
      SuperiorityMichael S2020/09/18 05:06 AM
        SuperiorityFoo_2020/09/18 05:17 AM
          SuperiorityMichael S2020/09/18 06:08 AM
      SuperiorityDavid Hess2020/09/18 11:55 AM
    SuperiorityAdrian2020/09/18 04:56 AM
      SuperiorityMichael S2020/09/18 06:51 AM
        SuperiorityAdrian2020/09/18 08:35 AM
          SuperioritythePirate2020/09/19 01:28 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?