By: dmcq (dmcq.delete@this.fano.co.uk), July 9, 2015 4:24 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 9, 2015 12:14 pm wrote:
> Maynard Handley (name99.delete@this.name99.org) on July 9, 2015 10:12 am wrote:
> >
> > In a talk by Leslie Lamport at Stanford a few months ago, he claimed that formal verification of
> > an implementation of the POWER memory model had caught a bug that testing would not have caught.
> > You can read this as "the POWER memory model sucks" or you can read this as "this
> > sort of thing should be handled by formal verification rather than testing"...
>
> I think formal verification is absolutely the way to go when it comes
> to memory ordering. Because it is too damn hard any other way.
>
> However, I happen to believe in formal verification as a solution for things like cache
> protocol definitions (where it is already used) and hardware (where I think it's used
> at some levels). Make the rules clear, understandable and unambiguous (and very much
> not surprising), and show that the hardware actually follows the rules.
>
> But formal verification is not necessarily nearly as useful for software.
>
> I used to think that it was completely unrealistic for software, but it turns out that it's at
> least somewhat possible to maybe not really "formally" verify these things, but at least automate
> a lot of memory access verification, and there are groups doing some very impressive things.
>
> The software verification is often with adding annotations or simplifying the problem space other
> ways, though - so the "verification" is not necessarily verifying the actual totality code, but
> a simplified representation or sub-case. In most cases the verification is run over traces of code
> (running on different CPU's) and verifying that those traces follow the rules. So it may not be
> formally verifying the whole thing, but at least it's verifying something, preferably against
> the actual hardware rules that have also been verified to be honored by the hardware.
>
> But that kind of software verification will not be run on most software. There are groups using
> Linux for their work (and presumably other systems too, I just wouldn't know about it), and there
> are groups (often the same group) looking at things like core system libraries. So hopefully there
> will be coverage of the really core system code, and hopefully at least the core locking primitives
> have been verified to work, and some really basic libraries are using them correctly.
>
> But verifying software (even with the caveats mentioned above) formally in general? Yeah, not
> going to happen. People will forget to take locks, and people will create their own ad-hoc synchronization
> models, and they will get them wrong. And they will "work". Most of the time.
>
> And the subtler and the less intuitive the hardware rules are, the more likely it will be that they will
> work fine on one piece of hardware (inevitably the one that the developer has, because that's what he tested
> his fancy new idea on), but then result in very odd and occasional problems on other implementations.
>
> Tightening down the hardware rules really does help enormously with problems like that. The code may be buggy
> as hell, but at least what you tested on one machine tends to be reasonably close to what you run three generations
> later and your binary still works, rather than mysteriously crashing in ways you can't reproduce.
>
> And btw, this is not at all a new issue. The whole "one CPU does something subtly different from another
> CPU because it wasn't well-defined in the architecture" is age-old, and it has been a problem before.
>
> Think "undefined flags after operations" kind of things (x86 had several cases of that), or unused
> high bits in pointer uses (m68k), or incomplete instruction decoding (6502) or silent odd behavior
> with unaligned addresses (ARM) or any number of those under-specified hardware issues that people
> ended up using even though they often weren't documented (and then people had to document them).
>
> HW designers have largely learnt not to do that any more. Weak memory ordering isn't really all that
> different. It's simply worth making the definition stricter (and using extra hardware to enforce
> the stricter rules) to avoid problems down the line in the next microarchitecture when you changed
> something fundamentally internally but don't want that change to break existing applications.
>
> Linus
If every system supported sequential consistency and was clearly defined to it would exacerbate the problems not solve them. It would not solve most of the problems and it would encourage intelligent programmers to devise yet more tricks. And as to your contention that Linux supports just having a weak model with data dependency there are Linux documents clearly saying data dependency is needed in Linux without mentioning rcu or particular processors.
There are going to be more problems in the future not less, it is not a question of fixing the bugs but stopping new ones. Instead of having all the processors going at the same speed you have different speeds as in ARM systems. Far more processors will be used and the relative distance between them considering their speed will become much greater. GPUs are acting as associated processors and network processing wants data passed around easily. The hardware needs a simple model and sequential consistency is not that. It is just a CS model and not something for humans to program with. Any sequential consistency should be in marked out blocks of code that users can think about and anything more is just trouble.
Yes loads of unaligned data is the way everyone does it. That came mainly because of Linux too from the x86 implementation. It was also found to be useful in implementing string operations like memcpy. But we would have been better off if it had just faulted and people had to use operations that were explicitly for unaligned load. It would mean less bugs in the code.
Anyway sequential consistency or anything like it is not going to be implemented as it does put in too much of a constraint on the hardware for no particular advantage. You can think of it as not caring about bugs but sequential consistency in hardware fo all memory operations is just a very bad idea for both hardware and software. Think of it if you like as the simplicity of RISC which meant we don't have interrupts on overflows or automatic bounds checks on array accesses - now there's an area where if we really cared about bugs rather than performance we would have done things differently.
> Maynard Handley (name99.delete@this.name99.org) on July 9, 2015 10:12 am wrote:
> >
> > In a talk by Leslie Lamport at Stanford a few months ago, he claimed that formal verification of
> > an implementation of the POWER memory model had caught a bug that testing would not have caught.
> > You can read this as "the POWER memory model sucks" or you can read this as "this
> > sort of thing should be handled by formal verification rather than testing"...
>
> I think formal verification is absolutely the way to go when it comes
> to memory ordering. Because it is too damn hard any other way.
>
> However, I happen to believe in formal verification as a solution for things like cache
> protocol definitions (where it is already used) and hardware (where I think it's used
> at some levels). Make the rules clear, understandable and unambiguous (and very much
> not surprising), and show that the hardware actually follows the rules.
>
> But formal verification is not necessarily nearly as useful for software.
>
> I used to think that it was completely unrealistic for software, but it turns out that it's at
> least somewhat possible to maybe not really "formally" verify these things, but at least automate
> a lot of memory access verification, and there are groups doing some very impressive things.
>
> The software verification is often with adding annotations or simplifying the problem space other
> ways, though - so the "verification" is not necessarily verifying the actual totality code, but
> a simplified representation or sub-case. In most cases the verification is run over traces of code
> (running on different CPU's) and verifying that those traces follow the rules. So it may not be
> formally verifying the whole thing, but at least it's verifying something, preferably against
> the actual hardware rules that have also been verified to be honored by the hardware.
>
> But that kind of software verification will not be run on most software. There are groups using
> Linux for their work (and presumably other systems too, I just wouldn't know about it), and there
> are groups (often the same group) looking at things like core system libraries. So hopefully there
> will be coverage of the really core system code, and hopefully at least the core locking primitives
> have been verified to work, and some really basic libraries are using them correctly.
>
> But verifying software (even with the caveats mentioned above) formally in general? Yeah, not
> going to happen. People will forget to take locks, and people will create their own ad-hoc synchronization
> models, and they will get them wrong. And they will "work". Most of the time.
>
> And the subtler and the less intuitive the hardware rules are, the more likely it will be that they will
> work fine on one piece of hardware (inevitably the one that the developer has, because that's what he tested
> his fancy new idea on), but then result in very odd and occasional problems on other implementations.
>
> Tightening down the hardware rules really does help enormously with problems like that. The code may be buggy
> as hell, but at least what you tested on one machine tends to be reasonably close to what you run three generations
> later and your binary still works, rather than mysteriously crashing in ways you can't reproduce.
>
> And btw, this is not at all a new issue. The whole "one CPU does something subtly different from another
> CPU because it wasn't well-defined in the architecture" is age-old, and it has been a problem before.
>
> Think "undefined flags after operations" kind of things (x86 had several cases of that), or unused
> high bits in pointer uses (m68k), or incomplete instruction decoding (6502) or silent odd behavior
> with unaligned addresses (ARM) or any number of those under-specified hardware issues that people
> ended up using even though they often weren't documented (and then people had to document them).
>
> HW designers have largely learnt not to do that any more. Weak memory ordering isn't really all that
> different. It's simply worth making the definition stricter (and using extra hardware to enforce
> the stricter rules) to avoid problems down the line in the next microarchitecture when you changed
> something fundamentally internally but don't want that change to break existing applications.
>
> Linus
If every system supported sequential consistency and was clearly defined to it would exacerbate the problems not solve them. It would not solve most of the problems and it would encourage intelligent programmers to devise yet more tricks. And as to your contention that Linux supports just having a weak model with data dependency there are Linux documents clearly saying data dependency is needed in Linux without mentioning rcu or particular processors.
There are going to be more problems in the future not less, it is not a question of fixing the bugs but stopping new ones. Instead of having all the processors going at the same speed you have different speeds as in ARM systems. Far more processors will be used and the relative distance between them considering their speed will become much greater. GPUs are acting as associated processors and network processing wants data passed around easily. The hardware needs a simple model and sequential consistency is not that. It is just a CS model and not something for humans to program with. Any sequential consistency should be in marked out blocks of code that users can think about and anything more is just trouble.
Yes loads of unaligned data is the way everyone does it. That came mainly because of Linux too from the x86 implementation. It was also found to be useful in implementing string operations like memcpy. But we would have been better off if it had just faulted and people had to use operations that were explicitly for unaligned load. It would mean less bugs in the code.
Anyway sequential consistency or anything like it is not going to be implemented as it does put in too much of a constraint on the hardware for no particular advantage. You can think of it as not caring about bugs but sequential consistency in hardware fo all memory operations is just a very bad idea for both hardware and software. Think of it if you like as the simplicity of RISC which meant we don't have interrupts on overflows or automatic bounds checks on array accesses - now there's an area where if we really cared about bugs rather than performance we would have done things differently.