By: anon (anon.delete@this.anon.com), July 13, 2015 8:51 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 13, 2015 2:10 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 13, 2015 1:46 pm wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on July 13, 2015 12:20 pm wrote:
> > >
> > > You simply don't seem to be able to acknowledge that supporting that is causing more problems,
> > > that Linux is part of a feedback loop leading to more buggy programs being produced and your
> > > attitude is part of a problem not a solution.
> >
> > You seem to not be able to understand reality.
> >
> > First off, Linux actually _works_ on all the broken memory models.When you say that "Linux
> > is part of a feedback loop", you're simply wrong, and full of sh*t. Linux is the the most
> > portable project I have ever even remotely heard of in reality. We actually do things right.
> > Yes, we've had bugs, but we're pretty much the only project out there that does things like
> > very fancy lockless algorithms across pretty much every architecture known to man.
> >
> > Put another way: I know what the f*ck I'm talking about. Weak memory models are
> > objectively inferior to stronger ones, and I've given you the reasons for it.
> >
> > And my claim is that
> >
> > (a) weak memory ordering doesn't actually buy you anything but confusion and very
> > subtle bugs despite (and sometimes due to) more complex code to handle it.
> >
> > And this is with people who actually understand memory ordering, and write scalable locking,
> > and then occasionally get it wrong because the issues are too damn subtle. Code that looks right,
> > and works fine in practice, turns out to have really really subtle issues sometimes.
> >
> > (b) weak memory ordering ends up being a big pain for software because testing is so inconclusive, and
> > developers have a super-hard time to see bugs that people with different hardware may trigger easily.
> >
> > (c) to make matters worse, some particularly crap versions of weak memory ordering also confused
> > the hell out of themselves when it came to the serialization. Power really is a complete disaster.
> > The difference between lwsync, isync, sync, eieio, and how they actually interact is insane.
> >
> > (d) weak memory models do not even perform better! It's the one thing that people claim is their
> > advantage, and I call BS on that claim. I claim that weak memory models actually perform worse, because
> > they end up adding synchronization in places where none is needed, and it would be better to just speculate
> > wildly (and in the very unusual case of a conflict, redo the operations in-order).
> >
> > What part of the above can you not acknowledge? You don't seem to really get that reality is
> > tough, and that testing is important, and that bugs happen. You keep living in some fairy land
> > where it's ok to say "tough, you shouldn't have bugs, especially in really subtle code that is
> > complicated further by the memory ordering making it impossible to test or even think about".
> >
>
> I'm not competent to discuss the technical issues, but if the matter is as cut-and-dried
> as you claim, why does it continue? There is nothing to stop ARM saying "part of ARM v8.1a
> is a new TSO memory model". Old code would still work (with the mem barriers appropriately
> NOP'd or close to), and new code would have no (or fewer and weaker) mem barriers.
IBM has *actually* done that since about POWER6 timeframe (too lazy to look it up exactly). They have a "SAO" or "strong access ordering" bit that is set on a per-virtual-page basis and gives memory consistency model of x86. They could literally use that for all their code (at least userspace, not sure much of the details about kernel mapping and low level fault handling in POWER, and not sure about MMIO), and then stop using half of their barrier instructions.
They don't. They use it for x86 emulation.
Obviously this does nothing to prove weak ordering is superior to x86 ordering, because probably they have not put as much effort into optimizing the x86 ordering as they would if it was their only memory model. But it does show that they can and do implement it and could go with it and noop-out some of their barriers if they wanted to. They're not blind to the concept (and neither is ARM of course or Apple or any modern designers).
Yes hardware designers have made mistakes. Mistakes that became apparent with hindsight. Mistakes that emerged only as other technology advanced. Blatant mistakes. Mistakes in failing to consider software. On memory consistency I don't agree with Linus that it's such a clear mistake, and there must be hardware advantages even on large deep OOOE cores like POWER8 for them to continue to use it.
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 13, 2015 1:46 pm wrote:
> > dmcq (dmcq.delete@this.fano.co.uk) on July 13, 2015 12:20 pm wrote:
> > >
> > > You simply don't seem to be able to acknowledge that supporting that is causing more problems,
> > > that Linux is part of a feedback loop leading to more buggy programs being produced and your
> > > attitude is part of a problem not a solution.
> >
> > You seem to not be able to understand reality.
> >
> > First off, Linux actually _works_ on all the broken memory models.When you say that "Linux
> > is part of a feedback loop", you're simply wrong, and full of sh*t. Linux is the the most
> > portable project I have ever even remotely heard of in reality. We actually do things right.
> > Yes, we've had bugs, but we're pretty much the only project out there that does things like
> > very fancy lockless algorithms across pretty much every architecture known to man.
> >
> > Put another way: I know what the f*ck I'm talking about. Weak memory models are
> > objectively inferior to stronger ones, and I've given you the reasons for it.
> >
> > And my claim is that
> >
> > (a) weak memory ordering doesn't actually buy you anything but confusion and very
> > subtle bugs despite (and sometimes due to) more complex code to handle it.
> >
> > And this is with people who actually understand memory ordering, and write scalable locking,
> > and then occasionally get it wrong because the issues are too damn subtle. Code that looks right,
> > and works fine in practice, turns out to have really really subtle issues sometimes.
> >
> > (b) weak memory ordering ends up being a big pain for software because testing is so inconclusive, and
> > developers have a super-hard time to see bugs that people with different hardware may trigger easily.
> >
> > (c) to make matters worse, some particularly crap versions of weak memory ordering also confused
> > the hell out of themselves when it came to the serialization. Power really is a complete disaster.
> > The difference between lwsync, isync, sync, eieio, and how they actually interact is insane.
> >
> > (d) weak memory models do not even perform better! It's the one thing that people claim is their
> > advantage, and I call BS on that claim. I claim that weak memory models actually perform worse, because
> > they end up adding synchronization in places where none is needed, and it would be better to just speculate
> > wildly (and in the very unusual case of a conflict, redo the operations in-order).
> >
> > What part of the above can you not acknowledge? You don't seem to really get that reality is
> > tough, and that testing is important, and that bugs happen. You keep living in some fairy land
> > where it's ok to say "tough, you shouldn't have bugs, especially in really subtle code that is
> > complicated further by the memory ordering making it impossible to test or even think about".
> >
>
> I'm not competent to discuss the technical issues, but if the matter is as cut-and-dried
> as you claim, why does it continue? There is nothing to stop ARM saying "part of ARM v8.1a
> is a new TSO memory model". Old code would still work (with the mem barriers appropriately
> NOP'd or close to), and new code would have no (or fewer and weaker) mem barriers.
IBM has *actually* done that since about POWER6 timeframe (too lazy to look it up exactly). They have a "SAO" or "strong access ordering" bit that is set on a per-virtual-page basis and gives memory consistency model of x86. They could literally use that for all their code (at least userspace, not sure much of the details about kernel mapping and low level fault handling in POWER, and not sure about MMIO), and then stop using half of their barrier instructions.
They don't. They use it for x86 emulation.
Obviously this does nothing to prove weak ordering is superior to x86 ordering, because probably they have not put as much effort into optimizing the x86 ordering as they would if it was their only memory model. But it does show that they can and do implement it and could go with it and noop-out some of their barriers if they wanted to. They're not blind to the concept (and neither is ARM of course or Apple or any modern designers).
Yes hardware designers have made mistakes. Mistakes that became apparent with hindsight. Mistakes that emerged only as other technology advanced. Blatant mistakes. Mistakes in failing to consider software. On memory consistency I don't agree with Linus that it's such a clear mistake, and there must be hardware advantages even on large deep OOOE cores like POWER8 for them to continue to use it.