Pre-populating anonymous pages

By: Brendan (btrotter.delete@this.gmail.com), June 8, 2019 2:55 am
Room: Moderated Discussions
Hi,

Travis Downs (travis.downs.delete@this.gmail.com) on June 7, 2019 2:16 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on June 6, 2019 11:01 am wrote:
> > Travis Downs (travis.downs.delete@this.gmail.com) on June 5, 2019 4:48 pm wrote:
> > >
> > > One interesting observation is that in this respect anonymous pages are slower than file-backed pages:
> > > file-backed pages on any kernel in the last 5-10 years use "fault-around" which reads in additional nearby
> > > pages (16 by default on x86) when a page faults. This behavior doesn't apply to anonymous pages.
> >
> > If you have a real load you can share where this is noticeable, we could probably add
> > it somewhat easily. But quite often it's not clear how many pages end up needing new
> > allocations after a fork, for example. The file mmap case tends to be much more predictable
> > (it's almost universally just for reading, executables being a big deal).
> >
> > (And while tmpfs may look like an anonymous mapping in many ways, it doesn't tend to have the
> > COW and zero page issues that a real anonymous mapping does. Of course, I'm not 100% convinced
> > the zero-page thing makes much sense any more at all. It was noticeable for some benchmarks
> > and some users that have sparse data mappings, but I wouldn't be surprised if it's a net loss
> > in reality these days with just extra page faults due to read->write transitions)
>
> I think the read->write transition thing is pretty rare in real code. Usually you get memory
> from malloc, which is uninitialized, so the first thing you do is write to it anyways.
>
> The cases where it happens are probably something like:
>
> 1) When calloc() is used (since good allocators will skip the memset when they
> use freshed mmaped memory to satisfy it), including when a malloc/memset(0) pair
> is used on compiles that can optimize that to a calloc (just gcc?).
>
> 2) People who mmap their own memory and use the fact that it comes zeroed. This probably applies more in
> JIT'd languages where the language safely requirements mean you need guaranteed zeroed memory in general,
> and thread-local allocators and JIT are integrated into the runtime and aware of the semantics and hence
> field zeroing can be elided (not sure how common this is, modern Java doesn't do it for example).
>
> In those scenarios it's possible to get the double-fault scenario.
>
> Even if the kernel changed the default behavior to not use the zero page, I guess people could
> still get the old behavior by mapping /dev/zero explicitly, right? The main use case where the
> existing behavior seems nice is easy implementation of very sparse r/w data structures.
>
> > So it might be very interesting if you have some true macrobenchmark with real loads etc.
>
> Nothing I can share at the moment, but it's not a low-level benchmark. It's even one specific
> case either, I've seen it a few times: but it's easy to see in a benchmark of MAP_POPULATE
> vs faulting pages one-by-one. Spectre and Meltdown made the fault cost grow enough that it
> crossed the pain threshold for me - it's about 2x slowdown populate vs not for 4k pages. Mostly
> painful for short lived processes that allocate a lot of memory and use it only once.
>
> Here's a totally artificial benchmark, but I get 0.33s for THP, 1.0s
> for MAP_POPULATE 4k pages and 1.9s w/o MAP_POPULATE 4k pages.

Let me see if I understand this correctly. My assumptions are:

a) When a process calls "mmap()" the kernel looks at various things (primarily how much physical RAM is currently free, but also things like whether meltdown mitigation are present, how fast/slow swap space is, etc) to try to estimate the optimum number of pages to pre-populate.

b) The MAP_POPULATE is merely a hint that influences the kernel's "optimum number of pages to pre-populate" estimation; partly because it's unreasonable to expect user-space to take everything into account itself, and partly because you can't shove "how many percent" into a 1-bit flag anyway.

c) The kernel may opportunistically use large/huge pages when pre-populating however many pages it estimated as the optimum amount.

d) The kernel may "de-populate" later (e.g. in an attempt to avoid using swap space when there's a large increase in physical memory usage after an area was already pre-populated).

e) When a "not populated yet" page is modified (causing a page fault) the page fault handler will populate that page, but may also pre-populate other pages in an attempt to avoid "likely future page faults", possibly including tracking history (e.g. some kind of "consecutive write detector") and using it (in conjunction with other information, like how much physical RAM is free now and if meltdown mitigations are being done) to determine how much extra to pre-populate during that page fault.

f) The results you've shown (" 0.33s for THP, 1.0s for MAP_POPULATE 4k pages and 1.9s w/o MAP_POPULATE 4k pages") are a strong indicator that Linux failed to implement some or all of the above correctly; because the performance difference between these cases should be significantly smaller.

- Brendan
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Pre-populating anonymous pagesTravis Downs2019/06/05 04:48 PM
  Pre-populating anonymous pagesJeff S.2019/06/05 08:03 PM
    Pre-populating anonymous pagesTravis Downs2019/06/06 07:11 AM
      Pre-populating anonymous pagesJeff S.2019/06/06 08:40 AM
        Pre-populating anonymous pagesTravis Downs2019/06/06 08:59 AM
          Pre-populating anonymous pagesJeff S.2019/06/06 09:19 AM
  Pre-populating anonymous pagesFoo_2019/06/06 12:30 AM
    Pre-populating anonymous pagesTravis Downs2019/06/06 06:59 AM
      Pre-populating anonymous pagesFoo_2019/06/06 07:56 AM
        Pre-populating anonymous pagesTravis Downs2019/06/06 09:02 AM
  Pre-populating anonymous pagesLinus Torvalds2019/06/06 11:01 AM
    Pre-populating anonymous pagesTravis Downs2019/06/07 02:16 PM
      Pre-populating anonymous pagesBrendan2019/06/08 02:55 AM
        Pre-populating anonymous pagesTravis Downs2019/06/08 08:18 AM
        Pre-populating anonymous pagesLinus Torvalds2019/06/08 11:43 AM
          Pre-populating anonymous pagesBrendan2019/06/09 03:29 AM
            Pre-populating anonymous pagesLinus Torvalds2019/06/10 11:20 AM
          Pre-populating anonymous pagesTravis Downs2019/06/17 09:18 AM
            Pre-populating anonymous pagesLinus Torvalds2019/06/18 04:28 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell avocado?