By: Mark Roulo (nothanks.delete@this.xxx.com), July 9, 2015 8:24 am
Room: Moderated Discussions
dmcq (dmcq.delete@this.fano.co.uk) on July 9, 2015 4:37 am wrote:
> Mark Roulo (nothanks.delete@this.xxx.com) on July 8, 2015 4:54 pm wrote:
> > A real-world example of this is double-checked locking. It was an obvious
> > performance improvement for singletons in Java and it also worked fine most
> > of the time.
> >
> > Where most of the time meant:
> > a) Almost all the time on all hardware/OS/JVM/JIT combinations.
> > b) ALL the time on some combinations ... including the popular ones (I can't reproduce your bug ...)
> >
> > I'll note that folks with CS backgrounds thought double-checked locking was a
> > good idea (for a while). One example is here:
> >
> > http://www.cs.wustl.edu/~schmidt/PDF/DC-Locking.pdf
> >
> > Yes, the Douglas Schmidt in this paper is *the* Doug Schmidt.
> >
> > A nice explanation of why this doesn't work so well is here:
> >
> > http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
> >
> > The point here being that *legal* behavior that is subtle, counter-intuitive and/or rare is bad.
> > Yes, the code is buggy. And some very talented folks made this mistake.
> >
> > But in this case the combination of human expectations and hardware design
> > choices make bugs like this more likely. Especially for average programmers.
> >
> > That is a bad combination, even if we can put the blame on someone besides
> > the CPU architects.
> >
> > Designing systems to be less likely to lead to bugs is a good thing. Not the *only*
> > good thing, and we won't be willing to give up an unlimited amount of performance for
> > it. But a good thing none the less.
>
> I fully agree that
> Designing systems to be less likely to lead to bugs is a good thing.
> Human expectation and hardware design can make bugs more likely.
>
> Where I totally differ from Linus is the lesson to be gained from examples like this
>
> His solution is to try and make the things people have already done less likely to produce errors.
> My preference is to make things people in the future do less likely to produce errors.
>
> These are quite different things. Making what people have already done less likely to produce
> errors means they have raised expectations that code like the example above will work.
One (or part of) Linus' points was that in practice errors tend to
get stomped out via testing. For this to work, you really do want/need
the system to be very well defined (e.g. all unused op-code trigger a trap
to halt the program). The combination of undefined/unspecified (so it
might work!) and counter-intuitive is deadly.
If the weakly defined memory ordering caused buggy code to fail consistently,
then it would be okay. Painful, maybe, as people learned to deal with it,
but okay. But if the code actually works (for a while ... or consistently
on the test system), then we will have a problem. The folks won't be less
likely to produce errors ... they just will produce errors that don't show
up for a while.
> Mark Roulo (nothanks.delete@this.xxx.com) on July 8, 2015 4:54 pm wrote:
> > A real-world example of this is double-checked locking. It was an obvious
> > performance improvement for singletons in Java and it also worked fine most
> > of the time.
> >
> > Where most of the time meant:
> > a) Almost all the time on all hardware/OS/JVM/JIT combinations.
> > b) ALL the time on some combinations ... including the popular ones (I can't reproduce your bug ...)
> >
> > I'll note that folks with CS backgrounds thought double-checked locking was a
> > good idea (for a while). One example is here:
> >
> > http://www.cs.wustl.edu/~schmidt/PDF/DC-Locking.pdf
> >
> > Yes, the Douglas Schmidt in this paper is *the* Doug Schmidt.
> >
> > A nice explanation of why this doesn't work so well is here:
> >
> > http://www.cs.umd.edu/~pugh/java/memoryModel/DoubleCheckedLocking.html
> >
> > The point here being that *legal* behavior that is subtle, counter-intuitive and/or rare is bad.
> > Yes, the code is buggy. And some very talented folks made this mistake.
> >
> > But in this case the combination of human expectations and hardware design
> > choices make bugs like this more likely. Especially for average programmers.
> >
> > That is a bad combination, even if we can put the blame on someone besides
> > the CPU architects.
> >
> > Designing systems to be less likely to lead to bugs is a good thing. Not the *only*
> > good thing, and we won't be willing to give up an unlimited amount of performance for
> > it. But a good thing none the less.
>
> I fully agree that
> Designing systems to be less likely to lead to bugs is a good thing.
> Human expectation and hardware design can make bugs more likely.
>
> Where I totally differ from Linus is the lesson to be gained from examples like this
>
> His solution is to try and make the things people have already done less likely to produce errors.
> My preference is to make things people in the future do less likely to produce errors.
>
> These are quite different things. Making what people have already done less likely to produce
> errors means they have raised expectations that code like the example above will work.
One (or part of) Linus' points was that in practice errors tend to
get stomped out via testing. For this to work, you really do want/need
the system to be very well defined (e.g. all unused op-code trigger a trap
to halt the program). The combination of undefined/unspecified (so it
might work!) and counter-intuitive is deadly.
If the weakly defined memory ordering caused buggy code to fail consistently,
then it would be okay. Painful, maybe, as people learned to deal with it,
but okay. But if the code actually works (for a while ... or consistently
on the test system), then we will have a problem. The folks won't be less
likely to produce errors ... they just will produce errors that don't show
up for a while.