By: dmcq (dmcq.delete@this.fano.co.uk), July 16, 2015 2:49 am
Room: Moderated Discussions
dmcq (dmcq.delete@this.fano.co.uk) on July 16, 2015 2:46 am wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 15, 2015 6:35 pm wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 15, 2015 2:43 pm wrote:
> > >
> > > So is this reordering causal according to you? I bet most people don't find it intuitive
> > > either as this kind of reordering is exactly what is expected on weak memory models.
> >
> > So that re-ordering is indeed causal, although it is not sequentially consistent.
> >
> > There is absolutely no question that
> >
> > (a) sequential consistency is the easiest one to understand (basically: "everything is ordered")
> >
> > (b) x86 is famously strongly ordered, but it is not that strongly ordered.
> >
> > A causal consistency involves a causal relationship: "if
> > A then B". That means you have a read of a variable
> > followed by a write to (possibly another) variable. The read is the conditional, the write is the action.
> >
> > It's called "causal" since the read of the first variable could be the cause of the write to the
> > second one. That cause can be in many ways - it could affect whether the write is done at all, it
> > could affect the actual value written, and it could affect which variable to be written.
> >
> > So a causal chain would be something like
> >
> > IF (A) THEN B=1
> >
> > and causal consistency means that if another CPU now sees "B" having the value 1, then
> > it had better see that A was true too. In memory ordering documentations you would skip
> > the "if/then" part and only talk about the actual accesses, so you have a pattern of
> >
> > read A
> > B=1
> >
> > to show that causal relationship, and then the rule would be that another CPU that does
> >
> > read B
> > read A
> >
> > would never see "B and !A" (because B having the value 1 is caused by A having a non-zero value).
> >
> > In contrast, your example is about sequential consistency,
> > not causal consistency: when you have the reverse pattern
> >
> > A=1
> > read B
> >
> > (and on another CPU, doing the same thing, except switching the variables: "B=1" followed
> > by "read A") clearly neither CPU had the actions they did be "caused" by anything.
> >
> > They didn't read anything from memory that could affect the end result: they
> > did an unconditional store, and then a unconditional read. So there is no "if
> > A then B" causal relationship - they are sequential, but not causal.
> >
> > So "sequential consistency" means that everybody agrees about sequential ordering. "Causal consistency"
> > is weaker than that, and means that just everybody agrees about causal ordering.
> >
> > Again, I do think that sequential memory consistency is by far the easiest one to
> > understand, and the one that would result in the fewest bugs. If I had the choice,
> > as a software person I'd pick sequential consistency. No argument about it.
> >
> > And again, x86 is absolutely not sequentially consistent. Never has been. The write buffering means very
> > fundamentally that stores will be delayed past later reads,
> > and that thus stores are clearly not sequentially
> > ordered with reads. And it makes it very easy to see that behavior that you see in that example.
> >
> > So I've never claimed that x86 was perfect. All I've claimed is
> > that the x86 memory ordering is better. It's not "best".
> >
> > Engineering side note: I also do claim that from a pure engineering standpoint, the whole "buffer
> > stores" model is at least a reasonable trade-off. Truly sequentially consistent memory ordering
> > is really really hard to make go fast. It's simply not practical. So while I as a software person
> > would find sequential consistency much easier to think about, and never have to worry about memory
> > re-ordering at all, I do see the huge advantage you get from having a store buffer.
> >
> > In contrast, the advantages of weakening the memory model further are much less obvious.
> > Stores really have to be delayed for good performance. No other memory re-ordering matters
> > nearly as much. Re-ordering loads past loads? Not nearly as big a deal. Just move the first
> > load even earlier. Re-ordering stores past stores? No biggie. They're buffered and not critical
> > anyway, so from a performance standpoint it's pretty much a no-op in the big picture.
> >
> > (And yes, that's certainly simplifying a bit. A ordered store buffer might need to be bigger
> > than an unordered one, and might not as easily catch some merging behavior. Similarly, re-ordering
> > reads past other reads might make for smaller queues. But those are second-order effects
> > at best, not at all the kind of fundamental issue that is "buffer stores").
> >
> > Linus
>
> I believe the write release/ load acquire model is a causal one like what Linus is
> saying rather than a sequentially consistent one, and that even the following
>
> A=0 B=0
> release A=1 release B=1
> acquire B acquire A
>
> could result in A and B both being zero. They really are designed to do
> what they say and not to be used for complicated logical reasonings.
Jeez sorry I didn't realize the mail system got rid of spaces. Putting them on top of each other we have instead
CPU1
A=0
release A=1
acquire B
CPU2
B=0
release B=1
acquire A
can return both A=0 and B=0
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 15, 2015 6:35 pm wrote:
> > Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 15, 2015 2:43 pm wrote:
> > >
> > > So is this reordering causal according to you? I bet most people don't find it intuitive
> > > either as this kind of reordering is exactly what is expected on weak memory models.
> >
> > So that re-ordering is indeed causal, although it is not sequentially consistent.
> >
> > There is absolutely no question that
> >
> > (a) sequential consistency is the easiest one to understand (basically: "everything is ordered")
> >
> > (b) x86 is famously strongly ordered, but it is not that strongly ordered.
> >
> > A causal consistency involves a causal relationship: "if
> > A then B". That means you have a read of a variable
> > followed by a write to (possibly another) variable. The read is the conditional, the write is the action.
> >
> > It's called "causal" since the read of the first variable could be the cause of the write to the
> > second one. That cause can be in many ways - it could affect whether the write is done at all, it
> > could affect the actual value written, and it could affect which variable to be written.
> >
> > So a causal chain would be something like
> >
> > IF (A) THEN B=1
> >
> > and causal consistency means that if another CPU now sees "B" having the value 1, then
> > it had better see that A was true too. In memory ordering documentations you would skip
> > the "if/then" part and only talk about the actual accesses, so you have a pattern of
> >
> > read A
> > B=1
> >
> > to show that causal relationship, and then the rule would be that another CPU that does
> >
> > read B
> > read A
> >
> > would never see "B and !A" (because B having the value 1 is caused by A having a non-zero value).
> >
> > In contrast, your example is about sequential consistency,
> > not causal consistency: when you have the reverse pattern
> >
> > A=1
> > read B
> >
> > (and on another CPU, doing the same thing, except switching the variables: "B=1" followed
> > by "read A") clearly neither CPU had the actions they did be "caused" by anything.
> >
> > They didn't read anything from memory that could affect the end result: they
> > did an unconditional store, and then a unconditional read. So there is no "if
> > A then B" causal relationship - they are sequential, but not causal.
> >
> > So "sequential consistency" means that everybody agrees about sequential ordering. "Causal consistency"
> > is weaker than that, and means that just everybody agrees about causal ordering.
> >
> > Again, I do think that sequential memory consistency is by far the easiest one to
> > understand, and the one that would result in the fewest bugs. If I had the choice,
> > as a software person I'd pick sequential consistency. No argument about it.
> >
> > And again, x86 is absolutely not sequentially consistent. Never has been. The write buffering means very
> > fundamentally that stores will be delayed past later reads,
> > and that thus stores are clearly not sequentially
> > ordered with reads. And it makes it very easy to see that behavior that you see in that example.
> >
> > So I've never claimed that x86 was perfect. All I've claimed is
> > that the x86 memory ordering is better. It's not "best".
> >
> > Engineering side note: I also do claim that from a pure engineering standpoint, the whole "buffer
> > stores" model is at least a reasonable trade-off. Truly sequentially consistent memory ordering
> > is really really hard to make go fast. It's simply not practical. So while I as a software person
> > would find sequential consistency much easier to think about, and never have to worry about memory
> > re-ordering at all, I do see the huge advantage you get from having a store buffer.
> >
> > In contrast, the advantages of weakening the memory model further are much less obvious.
> > Stores really have to be delayed for good performance. No other memory re-ordering matters
> > nearly as much. Re-ordering loads past loads? Not nearly as big a deal. Just move the first
> > load even earlier. Re-ordering stores past stores? No biggie. They're buffered and not critical
> > anyway, so from a performance standpoint it's pretty much a no-op in the big picture.
> >
> > (And yes, that's certainly simplifying a bit. A ordered store buffer might need to be bigger
> > than an unordered one, and might not as easily catch some merging behavior. Similarly, re-ordering
> > reads past other reads might make for smaller queues. But those are second-order effects
> > at best, not at all the kind of fundamental issue that is "buffer stores").
> >
> > Linus
>
> I believe the write release/ load acquire model is a causal one like what Linus is
> saying rather than a sequentially consistent one, and that even the following
>
> A=0 B=0
> release A=1 release B=1
> acquire B acquire A
>
> could result in A and B both being zero. They really are designed to do
> what they say and not to be used for complicated logical reasonings.
Jeez sorry I didn't realize the mail system got rid of spaces. Putting them on top of each other we have instead
CPU1
A=0
release A=1
acquire B
CPU2
B=0
release B=1
acquire A
can return both A=0 and B=0