lmbench is horribly broken

By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), March 16, 2017 6:49 pm
Room: Moderated Discussions
Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on March 16, 2017 4:57 pm wrote:
>
> These are all good reasons why lmbench is such a horrible broken benchmark as it
> doesn't allow any parallelism - it uses a single chain of memory references which
> are guaranteed to miss the TLB as well as the cache.

They aren't guaranteed to miss the TLB at all, actually. And the patterns when they do miss can be quite interesting. I made my own version that used the same physical page as a backing store exactly because I was looking at those kinds of things and wanted to see what was the D$ effect, and what was the TLB effect.

Have you ever looked at the lmbench lat_mem_rd *graphs*? Have you looked at different strides?

Ok, it's been years since I regularly ran lmbench, so I forget the exact details, but they are quite informative. And yes, you sometimes see the L1 -> L2 -> TLB -> L3 stages (or whatever - it will depend on your cache details, of course) fairly clearly.

It's usually not entirely black-and-white, since there are lots of fairly complex interactions. For example, TLB fills themselves have lots of caching going on in a good TLB.

The fact that you are so butt-hurt about the TLB overhead makes me suspect that you are unhappy with how crap it is on some of your favorite CPU's. Which doesn't surprise me at all, because doing TLB fills well takes effort and usually several generations of cores that have been used for loads where it matters (people say "server", but it's not actually all that clear-cut, and sometimes you see it most easily in HPC).

For example, if you have a good core, what that core will do is:

- cache mid-level page table lookups in a special TLB-specific cache (which might be the regular TLB itself, or might be a special part of the TLB)

- cache the last level in the regular D$ or at least a line cache (and have the usual prefetch engine notice it - you might get that part for free depending on at what level the TLB access is done).

which actually means that you can do TLB misses in very low number of cycles for the most common case of miss (ie the fairly good locality kind).

That happens to be the case that lmbench mostly tests once it gets going.

According to my "no D$ footprint - reuse the same physical page" tests, Intel was actually able to do TLB misses in single-digit cycles. Yes, that depended on the page tables themselves being cached, but that's actually a really important and relevant case.

I was very very impressed. Because it does matter.

And when you have a good TLB subsystem like that, your TLB misses simply don't suck so bad.

I suspect that you are unhappy because your favorite CPU is horribly bad at TLB misses.

For example, if you always go to memory for TLB walking, and you do it at every level, you are going to suck. You might suck even worse than a software TLB fill, in fact: alpha may have done it in software, but alpha did the smart thing in software, and ended up caching both upper levels and the actual last level. So on TLB heavy loads, on alpha you might have been in the situation that a TLB miss took a hundred cycles, but it could still be a lot less than three actual memory fetches.

And yes, lmbench will show what a piece of shit silicon you are running on. And showing that is a good thing. A benchmark that shows the bad cases is a good benchmark.

You seem to continually think that benchmarks that only test the easy case are somehow "better" benchmarks. That's exactly the wrong thing. A good benchmark will show where the weaknesses are.

So I repeat: I'd much rather see a memory latency test that actually shows realistic effects of a TLB, than one that has been expressly designed to not have a very big TLB footprint.

And if you miss on every access on the lmbench memory latency test, and your TLB walker is so bad that it does another (or several) memory accesses for every miss, then dammit, don't blame lmbench for the numbers. Blame your shit hardware!

The real problem with the lmbench memory latency number was not that lmbench did a bad thing, but that trying to summarize the number as a single number is really really hard, and lmbench did fairly badly at that. And most of the lmbench numbers most people ever looked at were just the summary numbers.

Linus
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ARM A73 benchmarksSymmetry03/14/17 06:24 AM
  ARM A73 benchmarksPer Hesselgren03/14/17 07:18 AM
    ARM A73 benchmarks-latencyPer Hesselgren03/14/17 08:58 AM
      ARM A73 benchmarks-latencySymmetry03/14/17 10:12 AM
        ARM A73 benchmarks-latencyPer Hesselgren03/14/17 03:54 PM
          ARM A73 benchmarks-latencyWilco03/15/17 01:45 AM
            ARM A73 benchmarks-latencyPer Hesselgren03/15/17 02:57 AM
              ARM A73 benchmarks-latencyPer Hesselgren03/15/17 03:00 AM
                ARM A73 benchmarks-latencyPer Hesselgren03/15/17 03:01 AM
                  clickable linkMichael S03/15/17 04:05 AM
            ARM A73 benchmarks-latencyLinus Torvalds03/15/17 10:05 AM
              ARM A73 benchmarks-latencyIreland03/15/17 05:02 PM
              ARM A73 benchmarks-latencyGabriele Svelto03/16/17 03:45 AM
                ARM A73 benchmarks-latencyLinus Torvalds03/16/17 02:01 PM
                  lmbench is horribly brokenWilco03/16/17 04:57 PM
                    lmbench is horribly brokenLinus Torvalds03/16/17 06:49 PM
                      lmbench is horribly brokenLinus Torvalds03/17/17 01:10 PM
                        lmbench is horribly brokenLinus Torvalds03/17/17 01:52 PM
                        lmbench is horribly brokenExophase03/17/17 02:31 PM
                          lmbench is horribly brokenGabriele Svelto03/17/17 03:20 PM
                          lmbench is horribly brokenLinus Torvalds03/17/17 05:56 PM
                            lmbench is horribly brokenExophase03/17/17 06:21 PM
                              lmbench is horribly brokenLinus Torvalds03/17/17 06:43 PM
                                lmbench is horribly brokenIreland03/17/17 07:37 PM
                                  lmbench is horribly brokenbakaneko03/18/17 11:17 AM
                                    lmbench is horribly brokenIreland03/18/17 12:23 PM
                                      lmbench is horribly brokenanon03/18/17 07:35 PM
                                      lmbench is horribly brokenbakaneko03/21/17 08:08 AM
                                        lmbench is horribly brokenIreland03/21/17 03:14 PM
                                lmbench is horribly brokenGabriele Svelto03/18/17 04:01 PM
                                  accessing dram RichardC03/18/17 06:33 PM
                                lmbench is horribly brokenExophase03/18/17 04:26 PM
                                  lmbench is horribly brokenWilco03/18/17 05:40 PM
                                    benchmarking reality?Anon03/19/17 02:29 PM
                                    lmbench is horribly brokenLinus Torvalds03/19/17 04:25 PM
                                      mea culpa (lmbench is horribly broken)Linus Torvalds03/19/17 06:05 PM
                                        mea culpa (lmbench is horribly broken)Bill Broadley03/21/17 01:41 AM
                                          mea culpa (lmbench is horribly broken)Linus Torvalds03/21/17 09:01 AM
                                            mea culpa (lmbench is horribly broken)Linus Torvalds03/21/17 11:14 AM
                                            mea culpa (lmbench is horribly broken)Linus Torvalds03/21/17 05:03 PM
                                              mea culpa (lmbench is horribly broken)Etienne03/22/17 04:37 AM
                                              mea culpa (lmbench is horribly broken)Tim McCaffrey03/22/17 08:54 AM
                                                mea culpa (lmbench is horribly broken)Tim McCaffrey03/22/17 09:34 AM
                                                mea culpa (lmbench is horribly broken)Linus Torvalds03/22/17 10:35 AM
                                                  mea culpa (lmbench is horribly broken)Ireland03/22/17 12:11 PM
                                                    mea culpa (lmbench is horribly broken)Ireland03/22/17 12:26 PM
                                                    mea culpa (lmbench is horribly broken)rwessel03/22/17 03:03 PM
                                                      mea culpa (lmbench is horribly broken)Ireland03/22/17 03:35 PM
                                                  mea culpa (lmbench is horribly broken)Linus Torvalds03/22/17 01:35 PM
                                                    mea culpa (lmbench is horribly broken)Gabriele Svelto03/23/17 08:05 AM
                                                      mea culpa (lmbench is horribly broken)Linus Torvalds03/23/17 10:43 AM
                                                        mea culpa (lmbench is horribly broken)Gabriele Svelto03/23/17 01:56 PM
                                                          mea culpa (lmbench is horribly broken)Ireland03/23/17 02:36 PM
                                                  mea culpa (lmbench is horribly broken)Travis03/22/17 01:38 PM
                                              mea culpa (lmbench is horribly broken)anon03/22/17 07:22 PM
                                                mea culpa (lmbench is horribly broken)Travis03/22/17 08:57 PM
                                                  mea culpa (lmbench is horribly broken)anon03/23/17 12:44 AM
                                                    mea culpa (lmbench is horribly broken)Michael S03/23/17 05:59 PM
                                                      mea culpa (lmbench is horribly broken)Travis03/23/17 09:03 PM
                                                    power8 numbersoctoploid03/24/17 11:47 PM
                                                      power8 numbers stride=128octoploid03/25/17 04:36 AM
                                                        power8 numbers stride=128Linus Torvalds03/25/17 10:50 AM
                                                          power8 numbers stride=128Gabriele Svelto03/25/17 11:27 PM
                                              mea culpa (lmbench is horribly broken)anon03/23/17 01:14 AM
                                                mea culpa (lmbench is horribly broken)Linus Torvalds03/23/17 11:22 AM
                                                  Thank you. Associativity misses explain it.anon03/23/17 10:48 PM
                                                    Thank you. Associativity misses explain it.Linus Torvalds03/24/17 01:26 PM
                                                      Thank you. Associativity misses explain it.Travis03/24/17 10:01 PM
                                                        thanks should read "but if it is any TYPE of mix" (NT)Travis03/24/17 10:02 PM
                                                        Thank you. Associativity misses explain it.Linus Torvalds03/25/17 12:10 PM
                                                          Thank you. Associativity misses explain it.Travis03/25/17 04:08 PM
                                                            Thank you. Associativity misses explain it.Linus Torvalds03/26/17 10:27 AM
                                  lmbench is horribly brokenLinus Torvalds03/19/17 03:51 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell green?