lmbench is horribly broken

By: Exophase (exophase.delete@this.gmail.com), March 18, 2017 4:26 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on March 17, 2017 6:43 pm wrote:
> It has everything to do with Wilco's criticism.
> The thing is, with a good TLB fill, lmbench's access pattern is completely invisible
> noise, because the TLB fill is basically free since it caches so well.
> The only time lmbench will give bigger numbers is when the TLB fill is crap - but then those bigger numbers
> are actually more accurate than something that tries to avoid TLB overhead. Because those bigger numbers are
> the much more true measure of the latency to a new memory location, and give you much better information.
> A notion of "memory latency" that is practically unattainable in practice is completely pointless.
> And if your TLB is weak, and your TLB fill is noticeable on lmbench, then your TLB fill will be
> noticeable on real programs too - for the exact same reason that they show up in lmbench.
> See?
> What Wilco is asking for is a completely idiotic number.
> For example, most CPU's will not actually start the memory access until they have probed
> all levels of cache for the data. Do you think that the "memory latency" number should be
> the latency without that cache lookup? Do you think it makes sense to try to subtract
> out the cost of looking in the L3 cache, and give a more "real" number that actually is
> about the time it takes for the memory chips to react to the pins wiggling on the CPU?
> When a CPU cache hierarchy grows a level (say, it gets a L3 cache, or it gets an external
> L4 cache, or whatever), the memory latency tends to invariably go up, exactly because there
> is now more cache lookup going on before the memory access is really started.
> Do you really want to make "memory latency" benchmarks try to hide that, and say that "no, the real
> memory latency is only X cycles after we've dismissed the time it took to probe all those caches"?
> I think everybody agrees that that would be completely insane. The fact is, memory
> latency includes the time to probe the caches, because that time is something you
> have to pay, and it's one of the costs of having caches in the first place.
> And for exactly tyhe same reason, memory latency should include an approximation of the time it
> takes to fill the TLB - it's just a fundamental part of the real cost of accessing memory.
> So if your TLB's are slow to fill, that damn well should show up in that number, instead
> of trying to hide it by forcing the accesses to be dense in the TLB, and making your benchmark
> do some unrealistic code that nobody ever does just to hide the TLB costs.
> Now, of course, a great benchmark will actually do both, and give you more information for what is actually
> taking all that time. So I think that a "minimize TLB footprint" benchmark would be a really really good thing
> if it gave both the "this is the cost without TLB" and "this is the cost you'll actually pay" numbers.
> So having benchmarks that do odd things (like my benchmark that maps the same page
> over and over is really odd and nobody should do except possibly for the special
> case of a zero page) can be useful in order to figure out where the costs are.
> But you shouldn't fool yourself. The special "figure out where the costs"
> are code is not the better code, and it doesn't reflect on reality.
> In contrast, the lmbench numbers actually to a fairly large degree do reflect on reality.
> They reflect on the number that a real application that follows a lot of pointers would see.
> Much more so than some "specifically avoid TLB misses" code that isn't actually real.
> Linus

Sorry but I'm still having a hard time with this. And I might be missing something still. But as far as I understand it with lmbench's thrash_initialize the order in which memory accesses and page table accesses are made is random. The stated intention of the test is to thrash caching of both data and the page table. It says so in the comments:

* Access a different page each time. This will eventually
* cause a tlb miss each page. It will also cause maximal
* thrashing in the cache between the user data stream and
* the page table entries.

Now if your LLC is large enough to cache the process's entire page table eventually you'll always hit regardless of how random the accesses are, but this should only be true if you're not also filling the cache with other stuff. And that's only really the case if the data you're loading is hitting the same cache lines from different virtual addresses like in your example. That isn't how lmbench is setup.

I really don't think anyone's arguing that memory latency tests should try to strip out the cost of cache misses along the way because that's unavoidable. I do actually think that having to go to L3 to fill the TLB could be a significant extra cost, far more than the 10 cycles you gave earlier, but that's still not nearly as bad as adding an entire second dependent memory access.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
ARM A73 benchmarksSymmetry03/14/17 06:24 AM
  ARM A73 benchmarksPer Hesselgren03/14/17 07:18 AM
    ARM A73 benchmarks-latencyPer Hesselgren03/14/17 08:58 AM
      ARM A73 benchmarks-latencySymmetry03/14/17 10:12 AM
        ARM A73 benchmarks-latencyPer Hesselgren03/14/17 03:54 PM
          ARM A73 benchmarks-latencyWilco03/15/17 01:45 AM
            ARM A73 benchmarks-latencyPer Hesselgren03/15/17 02:57 AM
              ARM A73 benchmarks-latencyPer Hesselgren03/15/17 03:00 AM
                ARM A73 benchmarks-latencyPer Hesselgren03/15/17 03:01 AM
                  clickable linkMichael S03/15/17 04:05 AM
            ARM A73 benchmarks-latencyLinus Torvalds03/15/17 10:05 AM
              ARM A73 benchmarks-latencyIreland03/15/17 05:02 PM
              ARM A73 benchmarks-latencyGabriele Svelto03/16/17 03:45 AM
                ARM A73 benchmarks-latencyLinus Torvalds03/16/17 02:01 PM
                  lmbench is horribly brokenWilco03/16/17 04:57 PM
                    lmbench is horribly brokenLinus Torvalds03/16/17 06:49 PM
                      lmbench is horribly brokenLinus Torvalds03/17/17 01:10 PM
                        lmbench is horribly brokenLinus Torvalds03/17/17 01:52 PM
                        lmbench is horribly brokenExophase03/17/17 02:31 PM
                          lmbench is horribly brokenGabriele Svelto03/17/17 03:20 PM
                          lmbench is horribly brokenLinus Torvalds03/17/17 05:56 PM
                            lmbench is horribly brokenExophase03/17/17 06:21 PM
                              lmbench is horribly brokenLinus Torvalds03/17/17 06:43 PM
                                lmbench is horribly brokenIreland03/17/17 07:37 PM
                                  lmbench is horribly brokenbakaneko03/18/17 11:17 AM
                                    lmbench is horribly brokenIreland03/18/17 12:23 PM
                                      lmbench is horribly brokenanon03/18/17 07:35 PM
                                      lmbench is horribly brokenbakaneko03/21/17 08:08 AM
                                        lmbench is horribly brokenIreland03/21/17 03:14 PM
                                lmbench is horribly brokenGabriele Svelto03/18/17 04:01 PM
                                  accessing dram RichardC03/18/17 06:33 PM
                                lmbench is horribly brokenExophase03/18/17 04:26 PM
                                  lmbench is horribly brokenWilco03/18/17 05:40 PM
                                    benchmarking reality?Anon03/19/17 02:29 PM
                                    lmbench is horribly brokenLinus Torvalds03/19/17 04:25 PM
                                      mea culpa (lmbench is horribly broken)Linus Torvalds03/19/17 06:05 PM
                                        mea culpa (lmbench is horribly broken)Bill Broadley03/21/17 01:41 AM
                                          mea culpa (lmbench is horribly broken)Linus Torvalds03/21/17 09:01 AM
                                            mea culpa (lmbench is horribly broken)Linus Torvalds03/21/17 11:14 AM
                                            mea culpa (lmbench is horribly broken)Linus Torvalds03/21/17 05:03 PM
                                              mea culpa (lmbench is horribly broken)Etienne03/22/17 04:37 AM
                                              mea culpa (lmbench is horribly broken)Tim McCaffrey03/22/17 08:54 AM
                                                mea culpa (lmbench is horribly broken)Tim McCaffrey03/22/17 09:34 AM
                                                mea culpa (lmbench is horribly broken)Linus Torvalds03/22/17 10:35 AM
                                                  mea culpa (lmbench is horribly broken)Ireland03/22/17 12:11 PM
                                                    mea culpa (lmbench is horribly broken)Ireland03/22/17 12:26 PM
                                                    mea culpa (lmbench is horribly broken)rwessel03/22/17 03:03 PM
                                                      mea culpa (lmbench is horribly broken)Ireland03/22/17 03:35 PM
                                                  mea culpa (lmbench is horribly broken)Linus Torvalds03/22/17 01:35 PM
                                                    mea culpa (lmbench is horribly broken)Gabriele Svelto03/23/17 08:05 AM
                                                      mea culpa (lmbench is horribly broken)Linus Torvalds03/23/17 10:43 AM
                                                        mea culpa (lmbench is horribly broken)Gabriele Svelto03/23/17 01:56 PM
                                                          mea culpa (lmbench is horribly broken)Ireland03/23/17 02:36 PM
                                                  mea culpa (lmbench is horribly broken)Travis03/22/17 01:38 PM
                                              mea culpa (lmbench is horribly broken)anon03/22/17 07:22 PM
                                                mea culpa (lmbench is horribly broken)Travis03/22/17 08:57 PM
                                                  mea culpa (lmbench is horribly broken)anon03/23/17 12:44 AM
                                                    mea culpa (lmbench is horribly broken)Michael S03/23/17 05:59 PM
                                                      mea culpa (lmbench is horribly broken)Travis03/23/17 09:03 PM
                                                    power8 numbersoctoploid03/24/17 11:47 PM
                                                      power8 numbers stride=128octoploid03/25/17 04:36 AM
                                                        power8 numbers stride=128Linus Torvalds03/25/17 10:50 AM
                                                          power8 numbers stride=128Gabriele Svelto03/25/17 11:27 PM
                                              mea culpa (lmbench is horribly broken)anon03/23/17 01:14 AM
                                                mea culpa (lmbench is horribly broken)Linus Torvalds03/23/17 11:22 AM
                                                  Thank you. Associativity misses explain it.anon03/23/17 10:48 PM
                                                    Thank you. Associativity misses explain it.Linus Torvalds03/24/17 01:26 PM
                                                      Thank you. Associativity misses explain it.Travis03/24/17 10:01 PM
                                                        thanks should read "but if it is any TYPE of mix" (NT)Travis03/24/17 10:02 PM
                                                        Thank you. Associativity misses explain it.Linus Torvalds03/25/17 12:10 PM
                                                          Thank you. Associativity misses explain it.Travis03/25/17 04:08 PM
                                                            Thank you. Associativity misses explain it.Linus Torvalds03/26/17 10:27 AM
                                  lmbench is horribly brokenLinus Torvalds03/19/17 03:51 PM
Reply to this Topic
Body: No Text
How do you spell green?