By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 15, 2015 6:35 pm
Room: Moderated Discussions
Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 15, 2015 2:43 pm wrote:
>
> So is this reordering causal according to you? I bet most people don't find it intuitive
> either as this kind of reordering is exactly what is expected on weak memory models.
So that re-ordering is indeed causal, although it is not sequentially consistent.
There is absolutely no question that
(a) sequential consistency is the easiest one to understand (basically: "everything is ordered")
(b) x86 is famously strongly ordered, but it is not that strongly ordered.
A causal consistency involves a causal relationship: "if A then B". That means you have a read of a variable followed by a write to (possibly another) variable. The read is the conditional, the write is the action.
It's called "causal" since the read of the first variable could be the cause of the write to the second one. That cause can be in many ways - it could affect whether the write is done at all, it could affect the actual value written, and it could affect which variable to be written.
So a causal chain would be something like
IF (A) THEN B=1
and causal consistency means that if another CPU now sees "B" having the value 1, then it had better see that A was true too. In memory ordering documentations you would skip the "if/then" part and only talk about the actual accesses, so you have a pattern of
read A
B=1
to show that causal relationship, and then the rule would be that another CPU that does
read B
read A
would never see "B and !A" (because B having the value 1 is caused by A having a non-zero value).
In contrast, your example is about sequential consistency, not causal consistency: when you have the reverse pattern
A=1
read B
(and on another CPU, doing the same thing, except switching the variables: "B=1" followed by "read A") clearly neither CPU had the actions they did be "caused" by anything.
They didn't read anything from memory that could affect the end result: they did an unconditional store, and then a unconditional read. So there is no "if A then B" causal relationship - they are sequential, but not causal.
So "sequential consistency" means that everybody agrees about sequential ordering. "Causal consistency" is weaker than that, and means that just everybody agrees about causal ordering.
Again, I do think that sequential memory consistency is by far the easiest one to understand, and the one that would result in the fewest bugs. If I had the choice, as a software person I'd pick sequential consistency. No argument about it.
And again, x86 is absolutely not sequentially consistent. Never has been. The write buffering means very fundamentally that stores will be delayed past later reads, and that thus stores are clearly not sequentially ordered with reads. And it makes it very easy to see that behavior that you see in that example.
So I've never claimed that x86 was perfect. All I've claimed is that the x86 memory ordering is better. It's not "best".
Engineering side note: I also do claim that from a pure engineering standpoint, the whole "buffer stores" model is at least a reasonable trade-off. Truly sequentially consistent memory ordering is really really hard to make go fast. It's simply not practical. So while I as a software person would find sequential consistency much easier to think about, and never have to worry about memory re-ordering at all, I do see the huge advantage you get from having a store buffer.
In contrast, the advantages of weakening the memory model further are much less obvious. Stores really have to be delayed for good performance. No other memory re-ordering matters nearly as much. Re-ordering loads past loads? Not nearly as big a deal. Just move the first load even earlier. Re-ordering stores past stores? No biggie. They're buffered and not critical anyway, so from a performance standpoint it's pretty much a no-op in the big picture.
(And yes, that's certainly simplifying a bit. A ordered store buffer might need to be bigger than an unordered one, and might not as easily catch some merging behavior. Similarly, re-ordering reads past other reads might make for smaller queues. But those are second-order effects at best, not at all the kind of fundamental issue that is "buffer stores").
Linus
>
> So is this reordering causal according to you? I bet most people don't find it intuitive
> either as this kind of reordering is exactly what is expected on weak memory models.
So that re-ordering is indeed causal, although it is not sequentially consistent.
There is absolutely no question that
(a) sequential consistency is the easiest one to understand (basically: "everything is ordered")
(b) x86 is famously strongly ordered, but it is not that strongly ordered.
A causal consistency involves a causal relationship: "if A then B". That means you have a read of a variable followed by a write to (possibly another) variable. The read is the conditional, the write is the action.
It's called "causal" since the read of the first variable could be the cause of the write to the second one. That cause can be in many ways - it could affect whether the write is done at all, it could affect the actual value written, and it could affect which variable to be written.
So a causal chain would be something like
IF (A) THEN B=1
and causal consistency means that if another CPU now sees "B" having the value 1, then it had better see that A was true too. In memory ordering documentations you would skip the "if/then" part and only talk about the actual accesses, so you have a pattern of
read A
B=1
to show that causal relationship, and then the rule would be that another CPU that does
read B
read A
would never see "B and !A" (because B having the value 1 is caused by A having a non-zero value).
In contrast, your example is about sequential consistency, not causal consistency: when you have the reverse pattern
A=1
read B
(and on another CPU, doing the same thing, except switching the variables: "B=1" followed by "read A") clearly neither CPU had the actions they did be "caused" by anything.
They didn't read anything from memory that could affect the end result: they did an unconditional store, and then a unconditional read. So there is no "if A then B" causal relationship - they are sequential, but not causal.
So "sequential consistency" means that everybody agrees about sequential ordering. "Causal consistency" is weaker than that, and means that just everybody agrees about causal ordering.
Again, I do think that sequential memory consistency is by far the easiest one to understand, and the one that would result in the fewest bugs. If I had the choice, as a software person I'd pick sequential consistency. No argument about it.
And again, x86 is absolutely not sequentially consistent. Never has been. The write buffering means very fundamentally that stores will be delayed past later reads, and that thus stores are clearly not sequentially ordered with reads. And it makes it very easy to see that behavior that you see in that example.
So I've never claimed that x86 was perfect. All I've claimed is that the x86 memory ordering is better. It's not "best".
Engineering side note: I also do claim that from a pure engineering standpoint, the whole "buffer stores" model is at least a reasonable trade-off. Truly sequentially consistent memory ordering is really really hard to make go fast. It's simply not practical. So while I as a software person would find sequential consistency much easier to think about, and never have to worry about memory re-ordering at all, I do see the huge advantage you get from having a store buffer.
In contrast, the advantages of weakening the memory model further are much less obvious. Stores really have to be delayed for good performance. No other memory re-ordering matters nearly as much. Re-ordering loads past loads? Not nearly as big a deal. Just move the first load even earlier. Re-ordering stores past stores? No biggie. They're buffered and not critical anyway, so from a performance standpoint it's pretty much a no-op in the big picture.
(And yes, that's certainly simplifying a bit. A ordered store buffer might need to be bigger than an unordered one, and might not as easily catch some merging behavior. Similarly, re-ordering reads past other reads might make for smaller queues. But those are second-order effects at best, not at all the kind of fundamental issue that is "buffer stores").
Linus