4K pages probably used to be too large

By: Ben LaHaise (bcrl.delete@this.kvack.org), May 3, 2021 5:36 pm
Room: Moderated Discussions
anon2 (anon.delete@this.anon.com) on May 3, 2021 1:17 am wrote:
> Ben LaHaise (bcrl.delete@this.kvack.org) on May 2, 2021 10:45 am wrote:
> > Yuhong Bao (yuhongbao_386.delete@this.hotmail.com) on May 1, 2021 1:01 pm wrote:
> > > The fun thing is that 4K pages probably used to be too large. On a 80386, just 8 tasks would consume
> > > at least 64k and probably 128k just for the page tables alone. (80386 page tables were two levels)
> >
> > The National Semiconductor 32016 had 512 byte page sizes. The problem is that overhead of small
> > page sizes becomes excessive as soon as you have more than a couple of megabytes of memory.
> > With 16MB of RAM and 512 byte pages that works out to 65536 pages for which data structures
> > to track all the individual pages are needed.
>
> That's what Linux does, but it is not necessarily the best size/speed tradeoff for very
> small memory systems. Tracking per page data with 4k pages and 64 bytes per page is
> still 25% the overhead of your "unviable" solution, which doesn't sound great when you
> put it that way. That being said I don't think it's necessarily bad at all.

Ehh? 4KB pages with 64 bytes per struct page in Linux is 1.56% of memory used for overhead. That's an order of magnitude less than the 12-25% memory overhead as would be the case with 512 byte pages.

Then there's the overhead of the page tables themselves. Using 512 byte pages you'd need 5 levels of page tables to cover a 32 bit address space. 2 levels of page tables are sufficient to completely cover a 32 bit address space. That's more than another order of magnitude of memory usage for 512 byte pages.

I used Linux with 4KB pages on the 386 back in the 1990s, and it was never a problem. Millions of 512 byte files was not a use-case that mattered back then as hard disks just weren't that big (a million 512 byte files just didn't fit on 240MB HDD so nobody did it), so the overhead of a 4KB page was a non-issue for that case. Even today there's no pressure to optimize for files smaller than 4KB as pretty much every HDD and SSD is tuned towards 4KB pages today.

...
> Caching the files in today's Linux with 512 byte pages takes 1149MB, with 4K pages 1316MB, about 15%
> more. You need to use 87 bytes per page in overhead before 512 overtakes 4K in memory usage there.

That use-case is not common at any layer in the hardware stack. Try running a few performance tests throwing 512 byte I/Os at common SSDs, and you'll quickly throw out the idea of using 512 byte I/Os in production. The only place that use-case really matters is with Maildir, and Maildir is so horrible in almost every other regard (the complete lack of indexing headers for large mailboxes makes it unusable for that use-case) that it's not worth tuning for (any sane mail client uses a database for caching messages, not individual files). Show me a use-case that is common where this matters, and keep in mind that virtually every HDD and SSD sold is working on 4KB pages and emulates 512 byte writes as horrifically slow read-modify-write operations that on some devices run more than *10 to 100 times* slower.

...
> It's funny the hand wringing over pages still goes on. People (not saying you, but some in CPU
> / OS space) are terrifed of the huge numbers they come up with by dividing things -- oh no, servers
> have a *billion* pages to manage, how will we ever cope?! (Answer: exactly the same way we coped
> when we had a million pages to deal with, and exactly the same way we cope with the *20 billion*
> cache lines that have to be managed, or the X million objects of application data the CPU has
> to deal with. Locality of reference. Works nicely for TLBs just as it does for data.

Uh, there are all kinds of optimizations going on to make kernels deal efficiently with millions and billions of pages. Just look at all the work on transparent huge pages, deferring and parallelizing of memory tracking data structures in early boot. The work is not done in Linux yet, and it is still an active area of development. OTOH, *maybe* if 512 byte pages were common, pressure for all the transparent huge page optimizations would have been implemented a decade earlier. Addressing the overhead of pages is still a major concern for kernel developers tuning workloads.

> The real reason 4K is a good number and remains a good number is not because of any
> absolute numbers (millions of pages!!) but just because fragmentation doesn't blow memory
> usage out too far for structures you want to allocate as page size, and that has not
> changed much since 1980. 8K would probably be okay, 2K would probably be okay.

Yes 4KB is a good number, but no, 2KB is not. A 32 bit system with 4KB pages uses 4KB pages for data, 4KB pages for PTEs and 4KB pages for PGD/PMDs. 2KB pages results in oddball 8 entries in the top level, so you're not longer using pages for the top level and need another allocator for that pool. 8KB makes a lot of sense if you have a physical 64 bit address space, but it also means that you're unable to emulate any system with 4KB pages. Not an issue if you're embedded, but a significant concern if you want to provide backward compatibility (like Apple in its current software migration to the M1 series).

> 512 is not unviable because it uses more memory (it might use less), it just doesn't use that much
> less that it makes much sense to be so small. But I don't think that's a new thing -- I would say
> depending on size/speed tradeoffs, it could easily have been a worse choice back in 1980 when TLBs
> were either tiny or used a lot of area on chip, and misses took huge expensive interrupts.

But I can't agree with this. 512 byte pages made sense on a 16 bit CPU that only had 24 physical address lines on the pin limited packages of the 1970s and early 1980s. Once we went to 32 bit systems, the memory overhead of 512 byte pages became too much, and 4KB pages were a natural fit.

I never actually did much programming on the NS32016 system with 512 byte pages (was just exposed to it in passing). I recall the first MMU Motorola built for the 68k series had support for several variable page sizes, but later CPUs dropped support for all the oddball page sizes and ended up being used predominantly with 4KB pages for Unix / Linux. Clearly plenty of smart people think 4KB pages are close to optimal if they're as widely supported by hardware as they are today while 512 byte or 2KB pages are not.

-ben
< Previous Post in ThreadNext Post in Thread >
TopicPosted ByDate
4K pages probably used to be too largeYuhong Bao2021/05/01 01:01 PM
  HDD seek time isn't freeMark Roulo2021/05/01 02:12 PM
    HDD seek time isn't freeYuhong Bao2021/05/01 02:21 PM
      HDD seek time isn't freeTim Mc2021/05/01 02:42 PM
        HDD seek time isn't freerwessel2021/05/01 02:57 PM
  4K pages probably used to be too largeBen LaHaise2021/05/02 10:45 AM
    VAX was 512 (in 1977) (NT)anonymous22021/05/02 08:36 PM
      FWIW, S/370 offered a choice of 2K and 4K (NT)rwessel2021/05/03 05:09 AM
      DEC's earliest PDP-11 disks were 512 (in 1971)John Yates2021/05/03 01:53 PM
    4K pages probably used to be too largeanon22021/05/03 01:17 AM
      4K pages probably used to be too largeBen LaHaise2021/05/03 05:36 PM
        Morotola 680x0 series page sizesBen LaHaise2021/05/03 05:51 PM
        4K pages probably used to be too largeanon22021/05/03 06:39 PM
          4K pages probably used to be too largeanon22021/05/03 08:51 PM
        4K pages probably used to be too largeYuhong Bao2021/05/03 10:51 PM
      4K pages probably used to be too largewumpus2021/05/05 09:06 AM
        4K pages probably used to be too largeanon22021/05/05 04:04 PM
          4K pages probably used to be too largeChester2021/05/05 06:45 PM
            4K pages probably used to be too largewumpus2021/05/06 09:06 AM
              Phenom TLB bugHeikki Kultala2021/05/06 12:46 PM
                Phenom TLB bugChester2021/05/06 05:29 PM
        4K pages probably used to be too largeEtienne Lorrain2021/05/06 01:08 AM
          4K pages probably used to be too largeJames2021/05/06 02:36 AM
            4K pages probably used to be too largerwessel2021/05/06 09:32 AM
              Reformatting SCSI disk sector sizeDoug S2021/05/06 11:30 AM
        4K pages probably used to be too largeDavid Hess2021/05/11 08:57 AM
  Page size is more complex/nuancedPaul A. Clayton2021/05/08 10:03 AM
Reply to this Topic
Name:
Email:
Topic:
Body: No Text
How do you spell tangerine? 🍊