By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), January 31, 2017 2:01 pm
Room: Moderated Discussions
rwessel (robertwessel.delete@this.yahoo.com) on January 31, 2017 12:29 am wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on January 30, 2017 10:01 am wrote:
>
> > Yeah, I'm, aware of the high-level details, the oddities are in the low-level issues
> > (very strange TLB flushing, iirc, due to some odd dirty bit handling and other rules,
> > I only see the patches flow by, I've never used it or looked all that closely).
>
>
> Well, given Linux's heritage, the dirty and referenced bits on S/360 are certainly odd
> (as they're attached to the physical page and not the TLB entry). Having some of the protection
> bits attached to the physical page probably also annoys the Linux kernel...
So the reason the dirty bit was somewhat annoying is that we actually do maintain it on a physical page basis as well in the kernel (because in the end, when you do IO, you don't ask yourself "is this virtual address dirty?" - you are writing out a particular physical copy, of course).
So I see where the hardware designer comes from: he's trying to make it easy for the OS. And it probably did help MVS, which presumably was the driving factor for that design.
But the fact that we do end up maintaining a physical dirty bit doesn't make the need for the per-virtual-mapping dirty bit go away: you still end up having several places that want to know if a write has been done through a particular mapping. It's not all that common (so the s390 people actually got away without a proper dirty bit per page table entry for a long while), but there really are cases where it is more than just useful.
In fact, a per-virtual-mapping dirty bit is so useful that we end up not just tracking the usual dirty state (that most CPU's give us in hardware), we end up having a second sw-only dirty state that we call the "soft dirty" state for tracking things like "has this page been changed since we last looked at it", which is useful for things like checkpointing.
But yes, the biggest annoyance was just that we share the VM
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on January 30, 2017 10:01 am wrote:
>
> > Yeah, I'm, aware of the high-level details, the oddities are in the low-level issues
> > (very strange TLB flushing, iirc, due to some odd dirty bit handling and other rules,
> > I only see the patches flow by, I've never used it or looked all that closely).
>
>
> Well, given Linux's heritage, the dirty and referenced bits on S/360 are certainly odd
> (as they're attached to the physical page and not the TLB entry). Having some of the protection
> bits attached to the physical page probably also annoys the Linux kernel...
So the reason the dirty bit was somewhat annoying is that we actually do maintain it on a physical page basis as well in the kernel (because in the end, when you do IO, you don't ask yourself "is this virtual address dirty?" - you are writing out a particular physical copy, of course).
So I see where the hardware designer comes from: he's trying to make it easy for the OS. And it probably did help MVS, which presumably was the driving factor for that design.
But the fact that we do end up maintaining a physical dirty bit doesn't make the need for the per-virtual-mapping dirty bit go away: you still end up having several places that want to know if a write has been done through a particular mapping. It's not all that common (so the s390 people actually got away without a proper dirty bit per page table entry for a long while), but there really are cases where it is more than just useful.
In fact, a per-virtual-mapping dirty bit is so useful that we end up not just tracking the usual dirty state (that most CPU's give us in hardware), we end up having a second sw-only dirty state that we call the "soft dirty" state for tracking things like "has this page been changed since we last looked at it", which is useful for things like checkpointing.
But yes, the biggest annoyance was just that we share the VM
> No disagreement. Although I personally find that to be more of a SW design issue: people
> > who design their calling conventions to be about cross-process boundaries are crazy.
>
> It has its moment, though. As an example, if you've done a DB2 SELECT, each of the subsequent
> FETCH's is just a wrapper around a PC (call-gate to DB2's address space). DB2, can then deposit
> the row right into your address space (there being pretty good cross address space support in
> zArch). Other solutions can reduce the number of cross address space calls (for example, buffering
> multiple rows), but at the expense of complexity and copies of the data.
As an OS person, I've often wanted the ability to just have the hardware write to different address spaces. Screw segments, I'd like to just have access to multiple address spaces. Sparc kind of had it (user vs kernel vs IO address space or whatever), but that doesn't handle the multi-process case.
But since almost nobody has that hardware support, it's useless to a portable OS, and we end up doing the "copy from one address space to another" by walking the page tables by hand.
Even on s390.
It turns out that specialized architecture features are almost never useful. Which makes for much less interesting processor architecture design ("you have to look like everybody else"), but much more useful actual processors.
Linus