By: dmcq (dmcq.delete@this.fano.co.uk), July 9, 2015 10:09 am
Room: Moderated Discussions
Mark Roulo (nothanks.delete@this.xxx.com) on July 9, 2015 8:24 am wrote:
> dmcq (dmcq.delete@this.fano.co.uk) on July 9, 2015 4:37 am wrote:
> > Mark Roulo (nothanks.delete@this.xxx.com) on July 8, 2015 4:54 pm wrote:
> > > A real-world example of this is double-checked locking. It was an obvious
> > > performance improvement for singletons in Java and it also worked fine most
> > > of the time.
> > >
> > > Where most of the time meant:
> > > a) Almost all the time on all hardware/OS/JVM/JIT combinations.
> > > b) ALL the time on some combinations ... including the popular ones (I can't reproduce your bug ...)
> > >
> > > I'll note that folks with CS backgrounds thought double-checked locking was a
> > > good idea (for a while). One example is here:
> > >
> > > http://www.cs.wustl.edu/~schmidt/PDF/DC-Locking.pdf
> > >
> > > Yes, the Douglas Schmidt in this paper is *the* Doug Schmidt.
> > >
> > > A nice explanation of why this doesn't work so well is here:
> > >
> > > http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
> > >
> > > The point here being that *legal* behavior that is subtle, counter-intuitive and/or rare is bad.
> > > Yes, the code is buggy. And some very talented folks made this mistake.
> > >
> > > But in this case the combination of human expectations and hardware design
> > > choices make bugs like this more likely. Especially for average programmers.
> > >
> > > That is a bad combination, even if we can put the blame on someone besides
> > > the CPU architects.
> > >
> > > Designing systems to be less likely to lead to bugs is a good thing. Not the *only*
> > > good thing, and we won't be willing to give up an unlimited amount of performance for
> > > it. But a good thing none the less.
> >
> > I fully agree that
> > Designing systems to be less likely to lead to bugs is a good thing.
> > Human expectation and hardware design can make bugs more likely.
> >
> > Where I totally differ from Linus is the lesson to be gained from examples like this
> >
> > His solution is to try and make the things people have already done less likely to produce errors.
> > My preference is to make things people in the future do less likely to produce errors.
> >
> > These are quite different things. Making what people have already done less likely to produce
> > errors means they have raised expectations that code like the example above will work.
>
> One (or part of) Linus' points was that in practice errors tend to
> get stomped out via testing. For this to work, you really do want/need
> the system to be very well defined (e.g. all unused op-code trigger a trap
> to halt the program). The combination of undefined/unspecified (so it
> might work!) and counter-intuitive is deadly.
>
> If the weakly defined memory ordering caused buggy code to fail consistently,
> then it would be okay. Painful, maybe, as people learned to deal with it,
> but okay. But if the code actually works (for a while ... or consistently
> on the test system), then we will have a problem. The folks won't be less
> likely to produce errors ... they just will produce errors that don't show
> up for a while.
It is far better to stomp error out in the design and coding phase rather than the testing phase. And marking sections is a real step towards being able to run tests where the sections are run in different orders rather than assuming that each individual instruction has funny shared memory semantics. It works at the user code level rather than the machine level. You are assuming people default to thinking multiple parallel programs have strong ordering semantics. It is quite easy and it works to say they don't - that they have to release and acquire resources for instance rather then depending on low level machine semantics and that the compiler doesn't do any optimization reordering the code. And at the end of the day you get code that is much more likely to work when it is moved to a different machine. This love of tricks and being a guru with them and fear of just ditching the whole business makes me thing of nothing so much as people who have been raised on Microsoft Word and have learnt a load of its ghastly tricks and fear moving away from it and push the burden of all its disparate and buggy ways of doing the same thing as a wonderful thing onto new people.
> dmcq (dmcq.delete@this.fano.co.uk) on July 9, 2015 4:37 am wrote:
> > Mark Roulo (nothanks.delete@this.xxx.com) on July 8, 2015 4:54 pm wrote:
> > > A real-world example of this is double-checked locking. It was an obvious
> > > performance improvement for singletons in Java and it also worked fine most
> > > of the time.
> > >
> > > Where most of the time meant:
> > > a) Almost all the time on all hardware/OS/JVM/JIT combinations.
> > > b) ALL the time on some combinations ... including the popular ones (I can't reproduce your bug ...)
> > >
> > > I'll note that folks with CS backgrounds thought double-checked locking was a
> > > good idea (for a while). One example is here:
> > >
> > > http://www.cs.wustl.edu/~schmidt/PDF/DC-Locking.pdf
> > >
> > > Yes, the Douglas Schmidt in this paper is *the* Doug Schmidt.
> > >
> > > A nice explanation of why this doesn't work so well is here:
> > >
> > > http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
> > >
> > > The point here being that *legal* behavior that is subtle, counter-intuitive and/or rare is bad.
> > > Yes, the code is buggy. And some very talented folks made this mistake.
> > >
> > > But in this case the combination of human expectations and hardware design
> > > choices make bugs like this more likely. Especially for average programmers.
> > >
> > > That is a bad combination, even if we can put the blame on someone besides
> > > the CPU architects.
> > >
> > > Designing systems to be less likely to lead to bugs is a good thing. Not the *only*
> > > good thing, and we won't be willing to give up an unlimited amount of performance for
> > > it. But a good thing none the less.
> >
> > I fully agree that
> > Designing systems to be less likely to lead to bugs is a good thing.
> > Human expectation and hardware design can make bugs more likely.
> >
> > Where I totally differ from Linus is the lesson to be gained from examples like this
> >
> > His solution is to try and make the things people have already done less likely to produce errors.
> > My preference is to make things people in the future do less likely to produce errors.
> >
> > These are quite different things. Making what people have already done less likely to produce
> > errors means they have raised expectations that code like the example above will work.
>
> One (or part of) Linus' points was that in practice errors tend to
> get stomped out via testing. For this to work, you really do want/need
> the system to be very well defined (e.g. all unused op-code trigger a trap
> to halt the program). The combination of undefined/unspecified (so it
> might work!) and counter-intuitive is deadly.
>
> If the weakly defined memory ordering caused buggy code to fail consistently,
> then it would be okay. Painful, maybe, as people learned to deal with it,
> but okay. But if the code actually works (for a while ... or consistently
> on the test system), then we will have a problem. The folks won't be less
> likely to produce errors ... they just will produce errors that don't show
> up for a while.
It is far better to stomp error out in the design and coding phase rather than the testing phase. And marking sections is a real step towards being able to run tests where the sections are run in different orders rather than assuming that each individual instruction has funny shared memory semantics. It works at the user code level rather than the machine level. You are assuming people default to thinking multiple parallel programs have strong ordering semantics. It is quite easy and it works to say they don't - that they have to release and acquire resources for instance rather then depending on low level machine semantics and that the compiler doesn't do any optimization reordering the code. And at the end of the day you get code that is much more likely to work when it is moved to a different machine. This love of tricks and being a guru with them and fear of just ditching the whole business makes me thing of nothing so much as people who have been raised on Microsoft Word and have learnt a load of its ghastly tricks and fear moving away from it and push the burden of all its disparate and buggy ways of doing the same thing as a wonderful thing onto new people.