By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 9, 2015 12:14 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 9, 2015 10:12 am wrote:
>
> In a talk by Leslie Lamport at Stanford a few months ago, he claimed that formal verification of
> an implementation of the POWER memory model had caught a bug that testing would not have caught.
> You can read this as "the POWER memory model sucks" or you can read this as "this
> sort of thing should be handled by formal verification rather than testing"...
I think formal verification is absolutely the way to go when it comes to memory ordering. Because it is too damn hard any other way.
However, I happen to believe in formal verification as a solution for things like cache protocol definitions (where it is already used) and hardware (where I think it's used at some levels). Make the rules clear, understandable and unambiguous (and very much not surprising), and show that the hardware actually follows the rules.
But formal verification is not necessarily nearly as useful for software.
I used to think that it was completely unrealistic for software, but it turns out that it's at least somewhat possible to maybe not really "formally" verify these things, but at least automate a lot of memory access verification, and there are groups doing some very impressive things.
The software verification is often with adding annotations or simplifying the problem space other ways, though - so the "verification" is not necessarily verifying the actual totality code, but a simplified representation or sub-case. In most cases the verification is run over traces of code (running on different CPU's) and verifying that those traces follow the rules. So it may not be formally verifying the whole thing, but at least it's verifying something, preferably against the actual hardware rules that have also been verified to be honored by the hardware.
But that kind of software verification will not be run on most software. There are groups using Linux for their work (and presumably other systems too, I just wouldn't know about it), and there are groups (often the same group) looking at things like core system libraries. So hopefully there will be coverage of the really core system code, and hopefully at least the core locking primitives have been verified to work, and some really basic libraries are using them correctly.
But verifying software (even with the caveats mentioned above) formally in general? Yeah, not going to happen. People will forget to take locks, and people will create their own ad-hoc synchronization models, and they will get them wrong. And they will "work". Most of the time.
And the subtler and the less intuitive the hardware rules are, the more likely it will be that they will work fine on one piece of hardware (inevitably the one that the developer has, because that's what he tested his fancy new idea on), but then result in very odd and occasional problems on other implementations.
Tightening down the hardware rules really does help enormously with problems like that. The code may be buggy as hell, but at least what you tested on one machine tends to be reasonably close to what you run three generations later and your binary still works, rather than mysteriously crashing in ways you can't reproduce.
And btw, this is not at all a new issue. The whole "one CPU does something subtly different from another CPU because it wasn't well-defined in the architecture" is age-old, and it has been a problem before.
Think "undefined flags after operations" kind of things (x86 had several cases of that), or unused high bits in pointer uses (m68k), or incomplete instruction decoding (6502) or silent odd behavior with unaligned addresses (ARM) or any number of those under-specified hardware issues that people ended up using even though they often weren't documented (and then people had to document them).
HW designers have largely learnt not to do that any more. Weak memory ordering isn't really all that different. It's simply worth making the definition stricter (and using extra hardware to enforce the stricter rules) to avoid problems down the line in the next microarchitecture when you changed something fundamentally internally but don't want that change to break existing applications.
Linus
>
> In a talk by Leslie Lamport at Stanford a few months ago, he claimed that formal verification of
> an implementation of the POWER memory model had caught a bug that testing would not have caught.
> You can read this as "the POWER memory model sucks" or you can read this as "this
> sort of thing should be handled by formal verification rather than testing"...
I think formal verification is absolutely the way to go when it comes to memory ordering. Because it is too damn hard any other way.
However, I happen to believe in formal verification as a solution for things like cache protocol definitions (where it is already used) and hardware (where I think it's used at some levels). Make the rules clear, understandable and unambiguous (and very much not surprising), and show that the hardware actually follows the rules.
But formal verification is not necessarily nearly as useful for software.
I used to think that it was completely unrealistic for software, but it turns out that it's at least somewhat possible to maybe not really "formally" verify these things, but at least automate a lot of memory access verification, and there are groups doing some very impressive things.
The software verification is often with adding annotations or simplifying the problem space other ways, though - so the "verification" is not necessarily verifying the actual totality code, but a simplified representation or sub-case. In most cases the verification is run over traces of code (running on different CPU's) and verifying that those traces follow the rules. So it may not be formally verifying the whole thing, but at least it's verifying something, preferably against the actual hardware rules that have also been verified to be honored by the hardware.
But that kind of software verification will not be run on most software. There are groups using Linux for their work (and presumably other systems too, I just wouldn't know about it), and there are groups (often the same group) looking at things like core system libraries. So hopefully there will be coverage of the really core system code, and hopefully at least the core locking primitives have been verified to work, and some really basic libraries are using them correctly.
But verifying software (even with the caveats mentioned above) formally in general? Yeah, not going to happen. People will forget to take locks, and people will create their own ad-hoc synchronization models, and they will get them wrong. And they will "work". Most of the time.
And the subtler and the less intuitive the hardware rules are, the more likely it will be that they will work fine on one piece of hardware (inevitably the one that the developer has, because that's what he tested his fancy new idea on), but then result in very odd and occasional problems on other implementations.
Tightening down the hardware rules really does help enormously with problems like that. The code may be buggy as hell, but at least what you tested on one machine tends to be reasonably close to what you run three generations later and your binary still works, rather than mysteriously crashing in ways you can't reproduce.
And btw, this is not at all a new issue. The whole "one CPU does something subtly different from another CPU because it wasn't well-defined in the architecture" is age-old, and it has been a problem before.
Think "undefined flags after operations" kind of things (x86 had several cases of that), or unused high bits in pointer uses (m68k), or incomplete instruction decoding (6502) or silent odd behavior with unaligned addresses (ARM) or any number of those under-specified hardware issues that people ended up using even though they often weren't documented (and then people had to document them).
HW designers have largely learnt not to do that any more. Weak memory ordering isn't really all that different. It's simply worth making the definition stricter (and using extra hardware to enforce the stricter rules) to avoid problems down the line in the next microarchitecture when you changed something fundamentally internally but don't want that change to break existing applications.
Linus