By: Brendan (btrotter.delete@this.gmail.com), July 19, 2022 8:29 pm
Room: Moderated Discussions
Hi,
Adrian (a.delete@this.acm.org) on July 18, 2022 11:02 pm wrote:
> Brendan (btrotter.delete@this.gmail.com) on July 18, 2022 3:11 pm wrote:
> >
> > Page tables/MMU allow some powerful tricks (swap space, copy on write/allocate on demand, etc); and
> > a kernel can and should exploit these same powerful tricks for its own benefit (e.g. making data that
> > kernel shouldn't need to access unable to be accessed accidentally
> > to improve kernel bug and/or vulnerability
> > detection, allowing parts of itself (and user-space page tables) to be sent to swap space to reduce
> > kernel's memory consumption, using a deliberately randomized physical address for each individual
> > page to improve kernel's security, NUMA optimizations where different copies of read-only kernel code
> > and data are used by different CPUs to improve kernel's performance, etc).
> >
> > In this case, if there are no page faults in privileged mode, then kernel
> > must be a steaming pile of garbage (unable to benefit from MMU).
>
> The usefulness of those tricks inside a kernel is controversial and I happen to be among those who
> believe that the kind of code that can benefit from such tricks should not belong to the kernel.
I happen to be among those who have implemented most of these tricks (everything except sending parts of kernel to swap space which doesn't make any sense for a micro-kernel), plus a few other tricks.
Even if you ignore all viability/performance concerns, there's an absolute limit to how minimal a kernel can possibly be - e.g. for 64-bit 80x86; a kernel must have a few tables (GDT, IDT, TSS), must have all interrupt and SYSCALL entry points (even if they're stubs that just reflect everything to something in user-space), must have some low-level task switching code, and must at least provide support for privileged instructions (reading/writing MSRs and control registers, etc). I also doubt it'd be possible to design a CPU that is able to provide security without requiring anything privileged.
In other words, you literally can not (e.g.) avoid the "kernel is slower for CPUs that aren't in the blessed NUMA domain" problem by shifting 100% of kernel's code out of the kernel.
> Also, my definition of a kernel which is a steaming pile of garbage is very different.
> For example, a kernel where a buggy device driver can make the computer unresponsive,
> or where it is not possible to kill a process at any time, is much closer to being a
> steaming pile of garbage than any kernel that is unable to benefit from MMU.
"Pile of garbage" is a relative term, just like "good/bad" - it requires a point of reference to compare against; and something can be simultaneously good (e.g. compared to MS-DOS) and bad (e.g. compared to modern Linux). People expect improvement, and the most useful point of reference is what exists today. Anything that isn't better than existing kernels is a pile of garbage (where the size of the pile depends on how much better it isn't - e.g. as good as existing kernels is a very tiny pile of garbage, and significantly worse than existing kernels is a huge pile of garbage). This includes existing kernels themselves - e.g. if the next version of the Linux kernel isn't better than the previous version, then the next version of the Linux kernel would be a pile of garbage compared to the previous version.
> The difference between the 2 definitions is that the quality of a kernel must be
> judged based on user-visible behavior. Whether benefiting from the MMU is an advantage
> for a kernel must be proven based on some observable quality of the kernel.
Nobody says "Whether shoving a glowing hot lump of rusty iron into your eyes is good or bad can not be determined until someone does it and the outcome can be observed". Even primates have enough intelligence to be able to predict outcomes without having observed or measured the end result.
For an example; it's trivial to predict (without having observed or measured) that a kernel that doesn't use MMU will make system calls that pass pointers more expensive (because kernel would have to convert the caller's virtual address into a physical address and will then have to deal with "data that's contiguous in user-space was split across page boundaries and isn't contiguous in the physical address space" problems).
> I agree that inside a very complex kernel, e.g. Linux, which includes a huge number of device
> drivers, the use of the MMU can be unavoidable, in order to provide memory protection and to
> allow dynamic allocation of memory pages. However, I believe that most of those device drivers
> should not work in privileged mode, and then they could use the MMU like any user programs.
Agreed.
> If on CPU architectures like x86-64, their performance would become lower in non-privileged
> mode, that just shows that the hardware is not designed well, with excessive overheads
> on context switching and IPC, not that the software architecture is good as it is.
I suspect you intended "If on CPU architectures like x86-64, their performance would become excessively lower". Taken literally, I doubt it's possible for a CPU to provide zero-cost control transfers between protection domains, and so I don't think it's possible for any hardware to ever meet your definition of "designed well" (unless there's no protection between protection domains).
Most people accept that moving things like drivers out of kernel and into user-space requires sacrificing a little performance for other benefits (often security). Micro-kernel and monolithic kernel advocates just disagree on whether that sacrifice is/isn't worthwhile.
- Brendan
Adrian (a.delete@this.acm.org) on July 18, 2022 11:02 pm wrote:
> Brendan (btrotter.delete@this.gmail.com) on July 18, 2022 3:11 pm wrote:
> >
> > Page tables/MMU allow some powerful tricks (swap space, copy on write/allocate on demand, etc); and
> > a kernel can and should exploit these same powerful tricks for its own benefit (e.g. making data that
> > kernel shouldn't need to access unable to be accessed accidentally
> > to improve kernel bug and/or vulnerability
> > detection, allowing parts of itself (and user-space page tables) to be sent to swap space to reduce
> > kernel's memory consumption, using a deliberately randomized physical address for each individual
> > page to improve kernel's security, NUMA optimizations where different copies of read-only kernel code
> > and data are used by different CPUs to improve kernel's performance, etc).
> >
> > In this case, if there are no page faults in privileged mode, then kernel
> > must be a steaming pile of garbage (unable to benefit from MMU).
>
> The usefulness of those tricks inside a kernel is controversial and I happen to be among those who
> believe that the kind of code that can benefit from such tricks should not belong to the kernel.
I happen to be among those who have implemented most of these tricks (everything except sending parts of kernel to swap space which doesn't make any sense for a micro-kernel), plus a few other tricks.
Even if you ignore all viability/performance concerns, there's an absolute limit to how minimal a kernel can possibly be - e.g. for 64-bit 80x86; a kernel must have a few tables (GDT, IDT, TSS), must have all interrupt and SYSCALL entry points (even if they're stubs that just reflect everything to something in user-space), must have some low-level task switching code, and must at least provide support for privileged instructions (reading/writing MSRs and control registers, etc). I also doubt it'd be possible to design a CPU that is able to provide security without requiring anything privileged.
In other words, you literally can not (e.g.) avoid the "kernel is slower for CPUs that aren't in the blessed NUMA domain" problem by shifting 100% of kernel's code out of the kernel.
> Also, my definition of a kernel which is a steaming pile of garbage is very different.
> For example, a kernel where a buggy device driver can make the computer unresponsive,
> or where it is not possible to kill a process at any time, is much closer to being a
> steaming pile of garbage than any kernel that is unable to benefit from MMU.
"Pile of garbage" is a relative term, just like "good/bad" - it requires a point of reference to compare against; and something can be simultaneously good (e.g. compared to MS-DOS) and bad (e.g. compared to modern Linux). People expect improvement, and the most useful point of reference is what exists today. Anything that isn't better than existing kernels is a pile of garbage (where the size of the pile depends on how much better it isn't - e.g. as good as existing kernels is a very tiny pile of garbage, and significantly worse than existing kernels is a huge pile of garbage). This includes existing kernels themselves - e.g. if the next version of the Linux kernel isn't better than the previous version, then the next version of the Linux kernel would be a pile of garbage compared to the previous version.
> The difference between the 2 definitions is that the quality of a kernel must be
> judged based on user-visible behavior. Whether benefiting from the MMU is an advantage
> for a kernel must be proven based on some observable quality of the kernel.
Nobody says "Whether shoving a glowing hot lump of rusty iron into your eyes is good or bad can not be determined until someone does it and the outcome can be observed". Even primates have enough intelligence to be able to predict outcomes without having observed or measured the end result.
For an example; it's trivial to predict (without having observed or measured) that a kernel that doesn't use MMU will make system calls that pass pointers more expensive (because kernel would have to convert the caller's virtual address into a physical address and will then have to deal with "data that's contiguous in user-space was split across page boundaries and isn't contiguous in the physical address space" problems).
> I agree that inside a very complex kernel, e.g. Linux, which includes a huge number of device
> drivers, the use of the MMU can be unavoidable, in order to provide memory protection and to
> allow dynamic allocation of memory pages. However, I believe that most of those device drivers
> should not work in privileged mode, and then they could use the MMU like any user programs.
Agreed.
> If on CPU architectures like x86-64, their performance would become lower in non-privileged
> mode, that just shows that the hardware is not designed well, with excessive overheads
> on context switching and IPC, not that the software architecture is good as it is.
I suspect you intended "If on CPU architectures like x86-64, their performance would become excessively lower". Taken literally, I doubt it's possible for a CPU to provide zero-cost control transfers between protection domains, and so I don't think it's possible for any hardware to ever meet your definition of "designed well" (unless there's no protection between protection domains).
Most people accept that moving things like drivers out of kernel and into user-space requires sacrificing a little performance for other benefits (often security). Micro-kernel and monolithic kernel advocates just disagree on whether that sacrifice is/isn't worthwhile.
- Brendan