lmbench is horribly broken

By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), March 17, 2017 5:56 pm
Room: Moderated Discussions
Exophase (exophase.delete@this.gmail.com) on March 17, 2017 2:31 pm wrote:
>
> You're using a unit stride and it's clearly engaging the prefetcher.

No.

The workload is actually entirely cached - there is no prefetching anywhere (except possibly the very first iteration).

The point of the workload is to see how well the TLB walker interacts with caching.

The thing is, caching works. We know that. Anybody who dismisses locality of references is so far out to lunch that it's not worth talking to that person.

Caching works particularly well for dense data structures, which is exactly what a page table walker is walking. Again, anybody who dismisses that is just crazy and/or incompetent.

When it comes to TLB walking, you really have two very different main cases):

(a) "dense" in the TLB: traditional streaming loads that take a lot of cache misses.

(b) "sparse" in the TLB: the workload might even fit in the D$ (at least at some level), but it's so spread out that you take a lot of TLB misses, and the TLB activity is really noticeable.

The thing is, (a) isn't even a worry. If you have a streaming load, you'll take TLB misses, but you'll take a lot more actual data cache misses unless your CPU core caches are seriously unbalanced.

So (a) just isn't all that interesting a load for the TLB. You have a high enough hit-rate that the TLB miss won't show up compared to normal misses, if your TLB is just reasonable enough (and yes, that generally does mean that you have a L2 TLB - and pretty much everybody does these days).

For (a), you want your TLB to not be ridiculously small, and you want the TLB fill to not suck too badly. But you really don't need to be all that clever, because the D$ misses outnumber the TLB misses by a huge margin.

But (b) is interesting. And it's not actually all that hard to trigger on some loads. If you do a lot of pointer-chasing, you may well be in the situation that the workload fits in the cache to a fairly large degree, but it's "fragmented" enough in the address space that you have a high TLB pressure.

And (b) is when the TLB walker really matters. The TLB costs aren't hidden by the "normal" data access costs. You'll see potentially huge TLB waling costs despite the fact that page tables are actually data structures that cache really well.

In fact, multi-level page tables (which is the common - and sanest - page table format) are really almost optimal for caching. They retain all the locality that the access pattern has, and improve it further by essentially compressing it by several bits. The top levels cache so well that caching even just a single entry at each level tends to capture almost all of the activity, and the last level is dense too, and works very well with caches.

So a TLB walker that doesn't use the normal D$ for entry walking is basically useless crap.

And if you do use the normal D$ for TLB lookup (perhaps limit it to just L2, to avoid L1 perturbations), you really can do very very well at TLB fills, and you basically have an almost infinitely-sized L3 TLB.

And if you do TLB fills badly and don't take advantage of the nice caching behavior of a multi-level tree, your core is crap, and you shouldn't blame the benchmark for it.

Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ARM A73 benchmarksSymmetry03/14/17 06:24 AM
  ARM A73 benchmarksPer Hesselgren03/14/17 07:18 AM
    ARM A73 benchmarks-latencyPer Hesselgren03/14/17 08:58 AM
      ARM A73 benchmarks-latencySymmetry03/14/17 10:12 AM
        ARM A73 benchmarks-latencyPer Hesselgren03/14/17 03:54 PM
          ARM A73 benchmarks-latencyWilco03/15/17 01:45 AM
            ARM A73 benchmarks-latencyPer Hesselgren03/15/17 02:57 AM
              ARM A73 benchmarks-latencyPer Hesselgren03/15/17 03:00 AM
                ARM A73 benchmarks-latencyPer Hesselgren03/15/17 03:01 AM
                  clickable linkMichael S03/15/17 04:05 AM
            ARM A73 benchmarks-latencyLinus Torvalds03/15/17 10:05 AM
              ARM A73 benchmarks-latencyIreland03/15/17 05:02 PM
              ARM A73 benchmarks-latencyGabriele Svelto03/16/17 03:45 AM
                ARM A73 benchmarks-latencyLinus Torvalds03/16/17 02:01 PM
                  lmbench is horribly brokenWilco03/16/17 04:57 PM
                    lmbench is horribly brokenLinus Torvalds03/16/17 06:49 PM
                      lmbench is horribly brokenLinus Torvalds03/17/17 01:10 PM
                        lmbench is horribly brokenLinus Torvalds03/17/17 01:52 PM
                        lmbench is horribly brokenExophase03/17/17 02:31 PM
                          lmbench is horribly brokenGabriele Svelto03/17/17 03:20 PM
                          lmbench is horribly brokenLinus Torvalds03/17/17 05:56 PM
                            lmbench is horribly brokenExophase03/17/17 06:21 PM
                              lmbench is horribly brokenLinus Torvalds03/17/17 06:43 PM
                                lmbench is horribly brokenIreland03/17/17 07:37 PM
                                  lmbench is horribly brokenbakaneko03/18/17 11:17 AM
                                    lmbench is horribly brokenIreland03/18/17 12:23 PM
                                      lmbench is horribly brokenanon03/18/17 07:35 PM
                                      lmbench is horribly brokenbakaneko03/21/17 08:08 AM
                                        lmbench is horribly brokenIreland03/21/17 03:14 PM
                                lmbench is horribly brokenGabriele Svelto03/18/17 04:01 PM
                                  accessing dram RichardC03/18/17 06:33 PM
                                lmbench is horribly brokenExophase03/18/17 04:26 PM
                                  lmbench is horribly brokenWilco03/18/17 05:40 PM
                                    benchmarking reality?Anon03/19/17 02:29 PM
                                    lmbench is horribly brokenLinus Torvalds03/19/17 04:25 PM
                                      mea culpa (lmbench is horribly broken)Linus Torvalds03/19/17 06:05 PM
                                        mea culpa (lmbench is horribly broken)Bill Broadley03/21/17 01:41 AM
                                          mea culpa (lmbench is horribly broken)Linus Torvalds03/21/17 09:01 AM
                                            mea culpa (lmbench is horribly broken)Linus Torvalds03/21/17 11:14 AM
                                            mea culpa (lmbench is horribly broken)Linus Torvalds03/21/17 05:03 PM
                                              mea culpa (lmbench is horribly broken)Etienne03/22/17 04:37 AM
                                              mea culpa (lmbench is horribly broken)Tim McCaffrey03/22/17 08:54 AM
                                                mea culpa (lmbench is horribly broken)Tim McCaffrey03/22/17 09:34 AM
                                                mea culpa (lmbench is horribly broken)Linus Torvalds03/22/17 10:35 AM
                                                  mea culpa (lmbench is horribly broken)Ireland03/22/17 12:11 PM
                                                    mea culpa (lmbench is horribly broken)Ireland03/22/17 12:26 PM
                                                    mea culpa (lmbench is horribly broken)rwessel03/22/17 03:03 PM
                                                      mea culpa (lmbench is horribly broken)Ireland03/22/17 03:35 PM
                                                  mea culpa (lmbench is horribly broken)Linus Torvalds03/22/17 01:35 PM
                                                    mea culpa (lmbench is horribly broken)Gabriele Svelto03/23/17 08:05 AM
                                                      mea culpa (lmbench is horribly broken)Linus Torvalds03/23/17 10:43 AM
                                                        mea culpa (lmbench is horribly broken)Gabriele Svelto03/23/17 01:56 PM
                                                          mea culpa (lmbench is horribly broken)Ireland03/23/17 02:36 PM
                                                  mea culpa (lmbench is horribly broken)Travis03/22/17 01:38 PM
                                              mea culpa (lmbench is horribly broken)anon03/22/17 07:22 PM
                                                mea culpa (lmbench is horribly broken)Travis03/22/17 08:57 PM
                                                  mea culpa (lmbench is horribly broken)anon03/23/17 12:44 AM
                                                    mea culpa (lmbench is horribly broken)Michael S03/23/17 05:59 PM
                                                      mea culpa (lmbench is horribly broken)Travis03/23/17 09:03 PM
                                                    power8 numbersoctoploid03/24/17 11:47 PM
                                                      power8 numbers stride=128octoploid03/25/17 04:36 AM
                                                        power8 numbers stride=128Linus Torvalds03/25/17 10:50 AM
                                                          power8 numbers stride=128Gabriele Svelto03/25/17 11:27 PM
                                              mea culpa (lmbench is horribly broken)anon03/23/17 01:14 AM
                                                mea culpa (lmbench is horribly broken)Linus Torvalds03/23/17 11:22 AM
                                                  Thank you. Associativity misses explain it.anon03/23/17 10:48 PM
                                                    Thank you. Associativity misses explain it.Linus Torvalds03/24/17 01:26 PM
                                                      Thank you. Associativity misses explain it.Travis03/24/17 10:01 PM
                                                        thanks should read "but if it is any TYPE of mix" (NT)Travis03/24/17 10:02 PM
                                                        Thank you. Associativity misses explain it.Linus Torvalds03/25/17 12:10 PM
                                                          Thank you. Associativity misses explain it.Travis03/25/17 04:08 PM
                                                            Thank you. Associativity misses explain it.Linus Torvalds03/26/17 10:27 AM
                                  lmbench is horribly brokenLinus Torvalds03/19/17 03:51 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?