Pre-populating anonymous pages

By: Travis Downs (travis.downs.delete@this.gmail.com), June 7, 2019 1:16 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on June 6, 2019 11:01 am wrote:
> Travis Downs (travis.downs.delete@this.gmail.com) on June 5, 2019 4:48 pm wrote:
> >
> > One interesting observation is that in this respect anonymous pages are slower than file-backed pages:
> > file-backed pages on any kernel in the last 5-10 years use "fault-around" which reads in additional nearby
> > pages (16 by default on x86) when a page faults. This behavior doesn't apply to anonymous pages.
>
> If you have a real load you can share where this is noticeable, we could probably add
> it somewhat easily. But quite often it's not clear how many pages end up needing new
> allocations after a fork, for example. The file mmap case tends to be much more predictable
> (it's almost universally just for reading, executables being a big deal).
>
> (And while tmpfs may look like an anonymous mapping in many ways, it doesn't tend to have the
> COW and zero page issues that a real anonymous mapping does. Of course, I'm not 100% convinced
> the zero-page thing makes much sense any more at all. It was noticeable for some benchmarks
> and some users that have sparse data mappings, but I wouldn't be surprised if it's a net loss
> in reality these days with just extra page faults due to read->write transitions)

I think the read->write transition thing is pretty rare in real code. Usually you get memory from malloc, which is uninitialized, so the first thing you do is write to it anyways.

The cases where it happens are probably something like:

1) When calloc() is used (since good allocators will skip the memset when they use freshed mmaped memory to satisfy it), including when a malloc/memset(0) pair is used on compiles that can optimize that to a calloc (just gcc?).

2) People who mmap their own memory and use the fact that it comes zeroed. This probably applies more in JIT'd languages where the language safely requirements mean you need guaranteed zeroed memory in general, and thread-local allocators and JIT are integrated into the runtime and aware of the semantics and hence field zeroing can be elided (not sure how common this is, modern Java doesn't do it for example).

In those scenarios it's possible to get the double-fault scenario.

Even if the kernel changed the default behavior to not use the zero page, I guess people could still get the old behavior by mapping /dev/zero explicitly, right? The main use case where the existing behavior seems nice is easy implementation of very sparse r/w data structures.

> So it might be very interesting if you have some true macrobenchmark with real loads etc.

Nothing I can share at the moment, but it's not a low-level benchmark. It's even one specific case either, I've seen it a few times: but it's easy to see in a benchmark of MAP_POPULATE vs faulting pages one-by-one. Spectre and Meltdown made the fault cost grow enough that it crossed the pain threshold for me - it's about 2x slowdown populate vs not for 4k pages. Mostly painful for short lived processes that allocate a lot of memory and use it only once.

Here's a totally artificial benchmark, but I get 0.33s for THP, 1.0s for MAP_POPULATE 4k pages and 1.9s w/o MAP_POPULATE 4k pages.

> But I suspect you are doing very specific and low-level CPU micro-benchmarks, and then
> just re-mmap'ing the memory with MAP_POPULATE is almost certainly the right thing to
> do. Even that won't work with really old kernels which just did a read to populate,
> but I don't think you can even find kernels that old any more in the wild.

THP is another solution, but one problem is that mmap(MAP_POPULATE) and THP are kind of at odds during allocation. To get hugepages efficiently you need to mmap without MAP_POPULATE (since otherwise you'll already populate everything as 4k pages), then do the madvise call. However, if the madvise isn't actually going to get hugepages (disabled or too fragmented), you'd much rather have use MAP_POPULATE, but even if you do manage to figure that out, it's too late (although I guess you can do a re-map with MAP_POPULATE now).

The root of the problem seems to be that you can't pass down the MADV_HUGEPAGE advice to mmap, but maybe there's a good reason for that.
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
Pre-populating anonymous pagesTravis Downs2019/06/05 03:48 PM
  Pre-populating anonymous pagesJeff S.2019/06/05 07:03 PM
    Pre-populating anonymous pagesTravis Downs2019/06/06 06:11 AM
      Pre-populating anonymous pagesJeff S.2019/06/06 07:40 AM
        Pre-populating anonymous pagesTravis Downs2019/06/06 07:59 AM
          Pre-populating anonymous pagesJeff S.2019/06/06 08:19 AM
  Pre-populating anonymous pagesFoo_2019/06/05 11:30 PM
    Pre-populating anonymous pagesTravis Downs2019/06/06 05:59 AM
      Pre-populating anonymous pagesFoo_2019/06/06 06:56 AM
        Pre-populating anonymous pagesTravis Downs2019/06/06 08:02 AM
  Pre-populating anonymous pagesLinus Torvalds2019/06/06 10:01 AM
    Pre-populating anonymous pagesTravis Downs2019/06/07 01:16 PM
      Pre-populating anonymous pagesBrendan2019/06/08 01:55 AM
        Pre-populating anonymous pagesTravis Downs2019/06/08 07:18 AM
        Pre-populating anonymous pagesLinus Torvalds2019/06/08 10:43 AM
          Pre-populating anonymous pagesBrendan2019/06/09 02:29 AM
            Pre-populating anonymous pagesLinus Torvalds2019/06/10 10:20 AM
          Pre-populating anonymous pagesTravis Downs2019/06/17 08:18 AM
            Pre-populating anonymous pagesLinus Torvalds2019/06/18 03:28 PM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? ūüćä