4K pages probably used to be too large

By: anon2 (anon.delete@this.anon.com), May 3, 2021 6:39 pm
Room: Moderated Discussions
Ben LaHaise (bcrl.delete@this.kvack.org) on May 3, 2021 5:36 pm wrote:
> anon2 (anon.delete@this.anon.com) on May 3, 2021 1:17 am wrote:
> > Ben LaHaise (bcrl.delete@this.kvack.org) on May 2, 2021 10:45 am wrote:
> > > Yuhong Bao (yuhongbao_386.delete@this.hotmail.com) on May 1, 2021 1:01 pm wrote:
> > > > The fun thing is that 4K pages probably used to be too large. On a 80386, just 8 tasks would consume
> > > > at least 64k and probably 128k just for the page tables alone. (80386 page tables were two levels)
> > >
> > > The National Semiconductor 32016 had 512 byte page sizes. The problem is that overhead of small
> > > page sizes becomes excessive as soon as you have more than a couple of megabytes of memory.
> > > With 16MB of RAM and 512 byte pages that works out to 65536 pages for which data structures
> > > to track all the individual pages are needed.
> >
> > That's what Linux does, but it is not necessarily the best size/speed tradeoff for very
> > small memory systems. Tracking per page data with 4k pages and 64 bytes per page is
> > still 25% the overhead of your "unviable" solution, which doesn't sound great when you
> > put it that way. That being said I don't think it's necessarily bad at all.
> Ehh? 4KB pages with 64 bytes per struct page in Linux is 1.56% of memory used for overhead. That's
> an order of magnitude less than the 12-25% memory overhead as would be the case with 512 byte pages.

I was responding to your miscalculated numbers, and using 32 bytes per page because that's what you would use on a tiny memory constrained system.

> Then there's the overhead of the page tables themselves.

I acknowledged that.

> Using 512 byte pages you'd need 5 levels of page
> tables to cover a 32 bit address space.

Wrong numbers again. 4 levels will cover it. 3 levels (28 bits) would be fine for a tiny memory system.

> 2 levels of page tables are sufficient to completely cover a 32
> bit address space. That's more than another order of magnitude of memory usage for 512 byte pages.

Hah no it's not! It's actually likely to use less memory than a 4K 2 level scheme quite often because you can fit 8 512 byte page table page in a 4k one, so even with quite a number of very sparse mappings, you can easily afford to have additional higher level entries. For the typical small exec/heap/stack/libraries that have just a few clusters addresses, 512 would use fewer memory quite likely. There's that fragmentation.

For dense mappings, it hardly matters, each higher level uses under 1% of the PDEs as its lower level.

> I used Linux with 4KB pages on the 386 back in the 1990s, and it was never a problem. Millions
> of 512 byte files was not a use-case that mattered back then as hard disks just weren't that
> big (a million 512 byte files just didn't fit on 240MB HDD so nobody did it), so the overhead
> of a 4KB page was a non-issue for that case.

In fact it certainly was, which is part of the reason filesystems supported 1k or even 512 byte blocks for a long time, for things like mail servers and web servers, files were typically very small.

> Even today there's no pressure to optimize for
> files smaller than 4KB as pretty much every HDD and SSD is tuned towards 4KB pages today.
> ...
> > Caching the files in today's Linux with 512 byte pages takes 1149MB, with 4K pages 1316MB, about 15%
> > more. You need to use 87 bytes per page in overhead before 512 overtakes 4K in memory usage there.
> That use-case is not common at any layer in the hardware stack.

You said 512 byte pages uses more memory, I'm saying it doesn't necessarily. And the exact test is not important to get too upset about, the same issue will appear with fragmentation with application memory allocation, page tables, etc. Anything that is allocated in page sized units.

> Try running a few performance tests throwing
> 512 byte I/Os at common SSDs, and you'll quickly throw out the idea of using 512 byte I/Os in production.

This is shifting the goal posts. I never said it would be great to use today, I disputed that it just became unviable at the end of the 80s or that it was due to memory size increase. Also block IOs aren't really attached to the MMU page size. And for true 512 byte block size devices, on workloads that are highly transactional, I certainly have seen 512 being preferred as recently as a few years ago.

> The only place that use-case really matters is with Maildir, and Maildir is so horrible in almost every
> other regard (the complete lack of indexing headers for large mailboxes makes it unusable for that use-case)
> that it's not worth tuning for (any sane mail client uses a database for caching messages, not individual
> files). Show me a use-case that is common where this matters, and keep in mind that virtually every HDD
> and SSD sold is working on 4KB pages and emulates 512 byte writes as horrifically slow read-modify-write
> operations that on some devices run more than *10 to 100 times* slower.

I disagree with your assertion that Maildir is the only use case that matters.

> ...
> > It's funny the hand wringing over pages still goes on. People (not saying you, but some in CPU
> > / OS space) are terrifed of the huge numbers they come up with by dividing things -- oh no, servers
> > have a *billion* pages to manage, how will we ever cope?! (Answer: exactly the same way we coped
> > when we had a million pages to deal with, and exactly the same way we cope with the *20 billion*
> > cache lines that have to be managed, or the X million objects of application data the CPU has
> > to deal with. Locality of reference. Works nicely for TLBs just as it does for data.
> Uh, there are all kinds of optimizations going on to make kernels deal efficiently with millions
> and billions of pages.


> > The real reason 4K is a good number and remains a good number is not because of any
> > absolute numbers (millions of pages!!) but just because fragmentation doesn't blow memory
> > usage out too far for structures you want to allocate as page size, and that has not
> > changed much since 1980. 8K would probably be okay, 2K would probably be okay.
> Yes 4KB is a good number, but no, 2KB is not.

2KB would be fine.

> A 32 bit system with 4KB pages uses 4KB pages for data,
> 4KB pages for PTEs and 4KB pages for PGD/PMDs. 2KB pages results in oddball 8 entries in the top level,
> so you're not longer using pages for the top level and need another allocator for that pool.

This really is your biggest concern with it? That's a complete non-issue. I'll (foolishly) take your word for the numbers, but 32 bytes extra per process (a nice cache line size) doesn't matter at all. Really.

> 8KB makes
> a lot of sense if you have a physical 64 bit address space, but it also means that you're unable to emulate
> any system with 4KB pages. Not an issue if you're embedded, but a significant concern if you want to provide
> backward compatibility (like Apple in its current software migration to the M1 series).
> > 512 is not unviable because it uses more memory (it might use less), it just doesn't use that much
> > less that it makes much sense to be so small. But I don't think that's a new thing -- I would say
> > depending on size/speed tradeoffs, it could easily have been a worse choice back in 1980 when TLBs
> > were either tiny or used a lot of area on chip, and misses took huge expensive interrupts.
> But I can't agree with this. 512 byte pages made sense on a 16 bit CPU that only had 24 physical
> address lines on the pin limited packages of the 1970s and early 1980s. Once we went to 32 bit systems,
> the memory overhead of 512 byte pages became too much, and 4KB pages were a natural fit.

Your numbers and assertions about memory overhead have been consistently bad so I don't think you're in a position to just declare it is bad.

And again it's not the memory overhead being too much, it's that the memory saving isn't large enough to really justify it (yes of course a larger page size helps with TLB coverage and some management, so I'm not saying 512 bytes would feel no pain of that).

> I never actually did much programming on the NS32016 system with 512 byte pages (was just exposed to it
> in passing). I recall the first MMU Motorola built for the 68k series had support for several variable
> page sizes, but later CPUs dropped support for all the oddball page sizes and ended up being used predominantly
> with 4KB pages for Unix / Linux. Clearly plenty of smart people think 4KB pages are close to optimal if
> they're as widely supported by hardware as they are today while 512 byte or 2KB pages are not.
> -ben

< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
4K pages probably used to be too largeYuhong Bao2021/05/01 01:01 PM
  HDD seek time isn't freeMark Roulo2021/05/01 02:12 PM
    HDD seek time isn't freeYuhong Bao2021/05/01 02:21 PM
      HDD seek time isn't freeTim Mc2021/05/01 02:42 PM
        HDD seek time isn't freerwessel2021/05/01 02:57 PM
  4K pages probably used to be too largeBen LaHaise2021/05/02 10:45 AM
    VAX was 512 (in 1977) (NT)anonymous22021/05/02 08:36 PM
      FWIW, S/370 offered a choice of 2K and 4K (NT)rwessel2021/05/03 05:09 AM
      DEC's earliest PDP-11 disks were 512 (in 1971)John Yates2021/05/03 01:53 PM
    4K pages probably used to be too largeanon22021/05/03 01:17 AM
      4K pages probably used to be too largeBen LaHaise2021/05/03 05:36 PM
        Morotola 680x0 series page sizesBen LaHaise2021/05/03 05:51 PM
        4K pages probably used to be too largeanon22021/05/03 06:39 PM
          4K pages probably used to be too largeanon22021/05/03 08:51 PM
        4K pages probably used to be too largeYuhong Bao2021/05/03 10:51 PM
      4K pages probably used to be too largewumpus2021/05/05 09:06 AM
        4K pages probably used to be too largeanon22021/05/05 04:04 PM
          4K pages probably used to be too largeChester2021/05/05 06:45 PM
            4K pages probably used to be too largewumpus2021/05/06 09:06 AM
              Phenom TLB bugHeikki Kultala2021/05/06 12:46 PM
                Phenom TLB bugChester2021/05/06 05:29 PM
        4K pages probably used to be too largeEtienne Lorrain2021/05/06 01:08 AM
          4K pages probably used to be too largeJames2021/05/06 02:36 AM
            4K pages probably used to be too largerwessel2021/05/06 09:32 AM
              Reformatting SCSI disk sector sizeDoug S2021/05/06 11:30 AM
        4K pages probably used to be too largeDavid Hess2021/05/11 08:57 AM
  Page size is more complex/nuancedPaul A. Clayton2021/05/08 10:03 AM
Reply to this Topic
Body: No Text
How do you spell tangerine? 🍊