By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), July 18, 2022 7:49 am
Room: Moderated Discussions
Kester L (nobody.delete@this.nothing.com) on June 29, 2022 1:49 pm wrote:
> https://queue.acm.org/detail.cfm?id=3534854
[snip]
> Your thoughts on this article? I was under the impression that a lot of the 80s attempts
> at capability machines (or really, anything that wasn't trying to be a glorified PDP-11)
> floundered because of performance and cost issues (i.e. the Intel i432).
[As usual, late to the party.☺]
I am a little surprised that there has not been more comment on the statement:
While I very much enjoy abstraction layers that map precisely to implementation (low accidental leakage) and are elegant, I also recognize that one of the goals of layered abstraction is to facilitate change and diversity. Diversity and change facilitate a closer fit to specific uses and implementation methods and increase understanding of implementations and interfaces (exposing bugs, misfeatures, and weaknesses in both). Since diversity and change can also dilute effort and experience, increase defects, and weaken the abstraction (by increasing the diversity and significance of side-channel information/abstraction leaks and incompatibly extending the abstraction) not every locally useful change will provide a net benefit for the system (and a change can be locally attractive without being locally useful — note: I do not view play as evil).
I would not consider the increasing leakiness and inelegance of a successful stable abstraction a "law of nature" (physics) but a law of information theory (mathematics). A successful abstraction will be applied in ways not initially intended; either the original abstraction will have been underdesigned — too flexible for the original application — or overdesigned — better fitting new applications will introduce inelegance.
The quoted statement from the article seems more like a rant than a considered expression of the weaknesses of successful stable abstractions.
For a single system-wide exception handler, this problem can be avoided by having hardwired mappings. 32-bit MIPS provided kseg0 (hardwired translation, cacheable memory) and kseg1 (hardwired translation, uncacheable memory), each 0.5 GiB. Fairchild's CLIPPER hardwired eight 4KiB pages in the kernel address space (CLIPPER completely separated supervisor and user address spaces and had separate cache and MMU chips for data so that there were technically four possible address spaces. "This permanent mapping provides several benefits: it makes the Boot ROM immediately available on reset; it also makes some I/O available during initialization; finally, it insures that the lowest 3 pages of the supervisor's address space (which are in constant use, since they contain the exception vector table) are always translated rapidly." [Introduction to the CLIPPER Architecture])
Another option is to have lockable translation entries. This does not keep software from improperly initializing system state; even hardware initialization of precognifigured and locked translations is not foolproof, software could unlock a translation entry or even just place critical memory in areas not mapped by such locked translations.
While I like the idea of metadata (inline or not), I doubt many would be enthused about 100% memory overhead. 12.5% for ECC has adoption problems (not helped by market segmentation tactics). (If address translation was completely eliminated, the relative overhead would be reduced but it would still be over 50%.)
I do not think having a word of metadata for every word of data was being presented as a reasonable alternative to typical systems, instead being an existence proof of a working alternative, though the later statement "even better would be to get rid of linear address spaces entirely and go back to the future, as successfully implemented in the Rational R1000 computer 30-plus years ago" makes this uncertain.
I think this is an example of poor expository writing. The use of "linear address space" to mean unconstrained pointers is another example of less than ideal clarity of communication. I do not think the author was trying to argue against contiguous flat address spaces; i.e., it is not problematic to make the addresses within a destroyed/deallocated object available to another object that does not map identically to the former object or that application software should handle diverse memory geometries and dead blocks rather than use a contiguous address space abstraction.
This is a weak use of statistics. CHERI removing 43 percent of security problems does not indicate how effective CHERI would be relative to other techniques or even how significantly security would be improved. I suspect at least some security problems actually manifest themselves more often as reliability problems. If software was converted to CHERI (or a safe language) with minimal effort, flaws would be caught (and possibly recognized as bugs and fixed) but reliability might decrease when permission violations generate fatal exceptions rather than more complex handling.
There is also the question of the difference between properly using CHERI and properly using a type-safe language in terms of programming effort; if one cannot trust a programmer to properly use a type-safe language (where CHERI-like checks are added by the compiler) can one trust a programmer to use CHERI. (Others have noted that implementing checks in software is not effectively different from implementing them in hardware, at least when the software can be trusted to do the checks.) CHERI provides a reasonably positioned root of trust — one is generally required to trust the hardware — facilitating permission delegation among diverse software. Distributing software in a format that made install-time correctness checks possible would seem to provide a similar effect. Software signing with enforceable warranties would also seem to provide significant improvement (not "WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE" or legal procedings pitting an individual against a multibillion dollar corporation in the jurisdiction of the corporation's choosing).
For many memory accesses in current software, CHERI seems to have excessive overhead relative to the protection provided. One can trust a compiler to get structure offsets correct even in an unsafe language. More type-safe languages (or programming disciplines) further increase the coverage.
CHERI does have a notable advantage: trusted software can communicate with untrusted software without having to use message passing. Forcing all cross-privilege communication to be pass-by-value has obvious performance issues (though hardware acceleration of copying would benefit other uses, so this would not have to be as expensive as it is on current systems, and copying is the normal communication mechanism for cache coherent sharing between processing nodes much less between networked systems).
"It has been recommended that we substitute a generalized generation count-based model for an information flow model. This would be functionally identical in the local capability case, used to protect per-stack data. However, it would also allow us to implement protection of thread-local state, as well as garbage collection, if desired. The current ISA does not yet reflect this planned change." ["Capability Hardware Enhanced RISC Instructions: CHERI Instruction-Set Architecture", Robert N. M. Watson et al., 2015] This hints at a synergy with versioned memory (possibly Cache-Only-Memory), which has other uses such as more dataflow-like thread synchronization (including speculative multithreading) and might have benefits for ordinary use of cache coherence. Versioned memory might allow store queue contents to be exposed to other threads (via cache) by providing a means to mark as "not necessarily the current version/correct value" with threads using such values not fully committing until "version consistency" is confirmed.
(If one wants to move even farther into strange memory models, a "page fault" or cache miss could regenerate data rather than retrieving it from backing store/alternative holder or provide Not-a-Thing data (which could be processed speculatively as non-data, locally generate possibly speculative data, locally choose to request the data from another source, or end the task dependent on that data).)
> https://queue.acm.org/detail.cfm?id=3534854
[snip]
> Your thoughts on this article? I was under the impression that a lot of the 80s attempts
> at capability machines (or really, anything that wasn't trying to be a glorified PDP-11)
> floundered because of performance and cost issues (i.e. the Intel i432).
[As usual, late to the party.☺]
I am a little surprised that there has not been more comment on the statement:
each successive generation in an architecture has been inflicted with yet another "extension," "accelerator," "cache," "look-aside buffer," or some other kind of "marchitecture," to the point where the once-nice and orthogonal architecture is almost obscured by the "improvements" that followed. It seems almost like a law of nature: Any successful computer architecture, under immense pressure to "improve" while "remaining 100 percent compatible," will become a complicated mess.
While I very much enjoy abstraction layers that map precisely to implementation (low accidental leakage) and are elegant, I also recognize that one of the goals of layered abstraction is to facilitate change and diversity. Diversity and change facilitate a closer fit to specific uses and implementation methods and increase understanding of implementations and interfaces (exposing bugs, misfeatures, and weaknesses in both). Since diversity and change can also dilute effort and experience, increase defects, and weaken the abstraction (by increasing the diversity and significance of side-channel information/abstraction leaks and incompatibly extending the abstraction) not every locally useful change will provide a net benefit for the system (and a change can be locally attractive without being locally useful — note: I do not view play as evil).
(Elegance is more than simplicity or orthogonality or well-organized; an variation in an elegant design is likely to generate a response of "Hmm ... oh ... OH!" as one initially sees an aspect that seems to deviate from the design aesthetic, then one sees that it is a reasonable irregularity for the purpose acheived, then one sees that it ties in with multiple other aspects of the system in a way that reinforces the individual characteristics of those aspects, empowers the aspects, and connects the aspects.)
I would not consider the increasing leakiness and inelegance of a successful stable abstraction a "law of nature" (physics) but a law of information theory (mathematics). A successful abstraction will be applied in ways not initially intended; either the original abstraction will have been underdesigned — too flexible for the original application — or overdesigned — better fitting new applications will introduce inelegance.
The quoted statement from the article seems more like a rant than a considered expression of the weaknesses of successful stable abstractions.
Having a single linear map would be prohibitively expensive in terms of memory for the map itself, so translations use a truncated tree structure, but that adds a whole slew of new possible exceptions: What if the page entry for the page directory entry for the page entry for the exception handler for missing page entries is itself empty?
For a single system-wide exception handler, this problem can be avoided by having hardwired mappings. 32-bit MIPS provided kseg0 (hardwired translation, cacheable memory) and kseg1 (hardwired translation, uncacheable memory), each 0.5 GiB. Fairchild's CLIPPER hardwired eight 4KiB pages in the kernel address space (CLIPPER completely separated supervisor and user address spaces and had separate cache and MMU chips for data so that there were technically four possible address spaces. "This permanent mapping provides several benefits: it makes the Boot ROM immediately available on reset; it also makes some I/O available during initialization; finally, it insures that the lowest 3 pages of the supervisor's address space (which are in constant use, since they contain the exception vector table) are always translated rapidly." [Introduction to the CLIPPER Architecture])
Another option is to have lockable translation entries. This does not keep software from improperly initializing system state; even hardware initialization of precognifigured and locked translations is not foolproof, software could unlock a translation entry or even just place critical memory in areas not mapped by such locked translations.
The instruction set is Ada primitives, it operates on bit fields of any alignment, the data bus is 128 bits wide: 64-bit for the data and 64-bit for data's type.
While I like the idea of metadata (inline or not), I doubt many would be enthused about 100% memory overhead. 12.5% for ECC has adoption problems (not helped by market segmentation tactics). (If address translation was completely eliminated, the relative overhead would be reduced but it would still be over 50%.)
I do not think having a word of metadata for every word of data was being presented as a reasonable alternative to typical systems, instead being an existence proof of a working alternative, though the later statement "even better would be to get rid of linear address spaces entirely and go back to the future, as successfully implemented in the Rational R1000 computer 30-plus years ago" makes this uncertain.
I think this is an example of poor expository writing. The use of "linear address space" to mean unconstrained pointers is another example of less than ideal clarity of communication. I do not think the author was trying to argue against contiguous flat address spaces; i.e., it is not problematic to make the addresses within a destroyed/deallocated object available to another object that does not map identically to the former object or that application software should handle diverse memory geometries and dead blocks rather than use a contiguous address space abstraction.
According to Microsoft Research, CHERI would have deterministically detected and prevented a full 43 percent of the security problems reported to the company in 2019. To put that number in perspective: The National Highway Traffic Safety Administration reports that 47 percent of the people killed in traffic accidents were not wearing seat belts.
This is a weak use of statistics. CHERI removing 43 percent of security problems does not indicate how effective CHERI would be relative to other techniques or even how significantly security would be improved. I suspect at least some security problems actually manifest themselves more often as reliability problems. If software was converted to CHERI (or a safe language) with minimal effort, flaws would be caught (and possibly recognized as bugs and fixed) but reliability might decrease when permission violations generate fatal exceptions rather than more complex handling.
There is also the question of the difference between properly using CHERI and properly using a type-safe language in terms of programming effort; if one cannot trust a programmer to properly use a type-safe language (where CHERI-like checks are added by the compiler) can one trust a programmer to use CHERI. (Others have noted that implementing checks in software is not effectively different from implementing them in hardware, at least when the software can be trusted to do the checks.) CHERI provides a reasonably positioned root of trust — one is generally required to trust the hardware — facilitating permission delegation among diverse software. Distributing software in a format that made install-time correctness checks possible would seem to provide a similar effect. Software signing with enforceable warranties would also seem to provide significant improvement (not "WITHOUT WARRANTY OF ANY KIND, EITHER EXPRESSED OR IMPLIED, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE" or legal procedings pitting an individual against a multibillion dollar corporation in the jurisdiction of the corporation's choosing).
For many memory accesses in current software, CHERI seems to have excessive overhead relative to the protection provided. One can trust a compiler to get structure offsets correct even in an unsafe language. More type-safe languages (or programming disciplines) further increase the coverage.
CHERI does have a notable advantage: trusted software can communicate with untrusted software without having to use message passing. Forcing all cross-privilege communication to be pass-by-value has obvious performance issues (though hardware acceleration of copying would benefit other uses, so this would not have to be as expensive as it is on current systems, and copying is the normal communication mechanism for cache coherent sharing between processing nodes much less between networked systems).
"It has been recommended that we substitute a generalized generation count-based model for an information flow model. This would be functionally identical in the local capability case, used to protect per-stack data. However, it would also allow us to implement protection of thread-local state, as well as garbage collection, if desired. The current ISA does not yet reflect this planned change." ["Capability Hardware Enhanced RISC Instructions: CHERI Instruction-Set Architecture", Robert N. M. Watson et al., 2015] This hints at a synergy with versioned memory (possibly Cache-Only-Memory), which has other uses such as more dataflow-like thread synchronization (including speculative multithreading) and might have benefits for ordinary use of cache coherence. Versioned memory might allow store queue contents to be exposed to other threads (via cache) by providing a means to mark as "not necessarily the current version/correct value" with threads using such values not fully committing until "version consistency" is confirmed.
(If one wants to move even farther into strange memory models, a "page fault" or cache miss could regenerate data rather than retrieving it from backing store/alternative holder or provide Not-a-Thing data (which could be processed speculatively as non-data, locally generate possibly speculative data, locally choose to request the data from another source, or end the task dependent on that data).)