lmbench is horribly broken

By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), March 17, 2017 1:52 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on March 17, 2017 1:10 pm wrote:
>
> What happens when I make the stride be 4k, to get maximal TLB use, [..]

Note that 4kB strides are by no means the worst possible case. They are bad, yes, because it maximizes the TLB use per access, but you can do worse, and explore how that affects the TLB fill costs.

For example, using a 32kB stride means that now you won't stride to the "next" entry in the page tables, but jump ahead 8 entries (this is obviously on x86-64 with a 4kB oage size), which then means that the last-level page table entry is no longer in the same cacheline as the previous one we loaded. But it can cause a less dense TLB use in the mapping, and to fix that, use a 36kB stride instead, so that you have an odd number of entries and you'll end up using up more TLB entries for that big area you are walking through.

Of course, associativity of the TLB can also end up showing issues, as can just other interactions with the D$ that contains the page table entries.

This is why it's interesting to play around with both stride and size, but also why it's hard to give a single number. The worst case I found so far had that "7 cycles in the optimal case" take 45 cycles. That was still in that same 128MB circular buffer, but using a stride of 116kB+60B. So you get 29 (prime number, should spread them out in the circular thing) pages in between accesses with an additional little update to use slightly more L1 D$ entries too and less regular accesses that might bother a prefetcher.

So my "worst case" TLB walker is certainly not some kind of theoretical worst case. But it's a good way to show TLB behavior that is still relevant (and can happen in real life depending on data layout), when it's not swamped by everything else. And it very clearly shows the effects of good TLB walking and caching at multiple levels, not just L2 TLB's.

You can see exactly how it matters that the page tables themselves can be cached.

Of course, if the TLB walker is stupid, it won't show any of this. If the TLB walker doesn't use cached accesses (and I've seen that - the rationale was to avoid cache pollution, and the rationale is garbage), you'll see uniformly bad behavior. The uniformly bad behavior gets even worse if the TLB walker doesn't cache internal page table tree lookups, at which point a TLB walk really kills you since it causes multiple actual memory accesses.

I'd certainly hope that no core is that terminally stupid. But I wouldn't be in the least bit surprised, and it would make each TLB miss take hundreds of cycles even in the "good" cases.

Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ARM A73 benchmarksSymmetry2017/03/14 06:24 AM
  ARM A73 benchmarksPer Hesselgren2017/03/14 07:18 AM
    ARM A73 benchmarks-latencyPer Hesselgren2017/03/14 08:58 AM
      ARM A73 benchmarks-latencySymmetry2017/03/14 10:12 AM
        ARM A73 benchmarks-latencyPer Hesselgren2017/03/14 03:54 PM
          ARM A73 benchmarks-latencyWilco2017/03/15 01:45 AM
            ARM A73 benchmarks-latencyPer Hesselgren2017/03/15 02:57 AM
              ARM A73 benchmarks-latencyPer Hesselgren2017/03/15 03:00 AM
                ARM A73 benchmarks-latencyPer Hesselgren2017/03/15 03:01 AM
                  clickable linkMichael S2017/03/15 04:05 AM
            ARM A73 benchmarks-latencyLinus Torvalds2017/03/15 10:05 AM
              ARM A73 benchmarks-latencyIreland2017/03/15 05:02 PM
              ARM A73 benchmarks-latencyGabriele Svelto2017/03/16 03:45 AM
                ARM A73 benchmarks-latencyLinus Torvalds2017/03/16 02:01 PM
                  lmbench is horribly brokenWilco2017/03/16 04:57 PM
                    lmbench is horribly brokenLinus Torvalds2017/03/16 06:49 PM
                      lmbench is horribly brokenLinus Torvalds2017/03/17 01:10 PM
                        lmbench is horribly brokenLinus Torvalds2017/03/17 01:52 PM
                        lmbench is horribly brokenExophase2017/03/17 02:31 PM
                          lmbench is horribly brokenGabriele Svelto2017/03/17 03:20 PM
                          lmbench is horribly brokenLinus Torvalds2017/03/17 05:56 PM
                            lmbench is horribly brokenExophase2017/03/17 06:21 PM
                              lmbench is horribly brokenLinus Torvalds2017/03/17 06:43 PM
                                lmbench is horribly brokenIreland2017/03/17 07:37 PM
                                  lmbench is horribly brokenbakaneko2017/03/18 11:17 AM
                                    lmbench is horribly brokenIreland2017/03/18 12:23 PM
                                      lmbench is horribly brokenanon2017/03/18 07:35 PM
                                      lmbench is horribly brokenbakaneko2017/03/21 08:08 AM
                                        lmbench is horribly brokenIreland2017/03/21 03:14 PM
                                lmbench is horribly brokenGabriele Svelto2017/03/18 04:01 PM
                                  accessing dram RichardC2017/03/18 06:33 PM
                                lmbench is horribly brokenExophase2017/03/18 04:26 PM
                                  lmbench is horribly brokenWilco2017/03/18 05:40 PM
                                    benchmarking reality?Anon2017/03/19 02:29 PM
                                    lmbench is horribly brokenLinus Torvalds2017/03/19 04:25 PM
                                      mea culpa (lmbench is horribly broken)Linus Torvalds2017/03/19 06:05 PM
                                        mea culpa (lmbench is horribly broken)Bill Broadley2017/03/21 01:41 AM
                                          mea culpa (lmbench is horribly broken)Linus Torvalds2017/03/21 09:01 AM
                                            mea culpa (lmbench is horribly broken)Linus Torvalds2017/03/21 11:14 AM
                                            mea culpa (lmbench is horribly broken)Linus Torvalds2017/03/21 05:03 PM
                                              mea culpa (lmbench is horribly broken)Etienne2017/03/22 04:37 AM
                                              mea culpa (lmbench is horribly broken)Tim McCaffrey2017/03/22 08:54 AM
                                                mea culpa (lmbench is horribly broken)Tim McCaffrey2017/03/22 09:34 AM
                                                mea culpa (lmbench is horribly broken)Linus Torvalds2017/03/22 10:35 AM
                                                  mea culpa (lmbench is horribly broken)Ireland2017/03/22 12:11 PM
                                                    mea culpa (lmbench is horribly broken)Ireland2017/03/22 12:26 PM
                                                    mea culpa (lmbench is horribly broken)rwessel2017/03/22 03:03 PM
                                                      mea culpa (lmbench is horribly broken)Ireland2017/03/22 03:35 PM
                                                  mea culpa (lmbench is horribly broken)Linus Torvalds2017/03/22 01:35 PM
                                                    mea culpa (lmbench is horribly broken)Gabriele Svelto2017/03/23 08:05 AM
                                                      mea culpa (lmbench is horribly broken)Linus Torvalds2017/03/23 10:43 AM
                                                        mea culpa (lmbench is horribly broken)Gabriele Svelto2017/03/23 01:56 PM
                                                          mea culpa (lmbench is horribly broken)Ireland2017/03/23 02:36 PM
                                                  mea culpa (lmbench is horribly broken)Travis2017/03/22 01:38 PM
                                              mea culpa (lmbench is horribly broken)anon2017/03/22 07:22 PM
                                                mea culpa (lmbench is horribly broken)Travis2017/03/22 08:57 PM
                                                  mea culpa (lmbench is horribly broken)anon2017/03/23 12:44 AM
                                                    mea culpa (lmbench is horribly broken)Michael S2017/03/23 05:59 PM
                                                      mea culpa (lmbench is horribly broken)Travis2017/03/23 09:03 PM
                                                    power8 numbersoctoploid2017/03/24 11:47 PM
                                                      power8 numbers stride=128octoploid2017/03/25 04:36 AM
                                                        power8 numbers stride=128Linus Torvalds2017/03/25 10:50 AM
                                                          power8 numbers stride=128Gabriele Svelto2017/03/25 11:27 PM
                                              mea culpa (lmbench is horribly broken)anon2017/03/23 01:14 AM
                                                mea culpa (lmbench is horribly broken)Linus Torvalds2017/03/23 11:22 AM
                                                  Thank you. Associativity misses explain it.anon2017/03/23 10:48 PM
                                                    Thank you. Associativity misses explain it.Linus Torvalds2017/03/24 01:26 PM
                                                      Thank you. Associativity misses explain it.Travis2017/03/24 10:01 PM
                                                        thanks should read "but if it is any TYPE of mix" (NT)Travis2017/03/24 10:02 PM
                                                        Thank you. Associativity misses explain it.Linus Torvalds2017/03/25 12:10 PM
                                                          Thank you. Associativity misses explain it.Travis2017/03/25 04:08 PM
                                                            Thank you. Associativity misses explain it.Linus Torvalds2017/03/26 10:27 AM
                                  lmbench is horribly brokenLinus Torvalds2017/03/19 03:51 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?