By: Travis (travis.downs.delete@this.gmail.com), April 12, 2017 3:41 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on April 12, 2017 12:46 pm wrote:
> > To be clear, I mean they talk about causal consistency in that white paper, and there
> > is a rule explicitly covering it, which apparently stands alone from the other rules.
>
> Apparently?
To be clear, I mean that each rule implies something not implied by the other rules, i.e., each rule additionally restricts the allowable behavior.
So what I meant by "stands alone" is that one rule mentioning causality is enough for causality to be an interesting part of the memory model. It doesn't need to be mentioned to every rule.
That said, I think the current inclusion of the rule is redundant - it is already implied by the other rules (in particular, the load/load and load/store rules), so mentioning it just adds confusion. Many versions of the document back, this rule may have been added at a time where the defined model was weaker (as you discuss below) so it may have actually added something then.
>
> > I'm not sure where the disagreement is: Intel used a "causal consistency" rule in the
> > past as this paper clearly shows. For example: "memory ordering obeys causality".
>
> They point is, that for as long as Intel bothered to describe its x86 memory ordering model in
> formal documents (I think, it happened for a first time in IPF docs, in a part talking about functionality
> of x86 hardware of McKinley and Madison. Could be wrong about it) they never called it "causal
> consistency", because it is obviously much stronger than just a causal consistency.
They don't seem to use a standard moniker to define their processor ordering. I don't think I was claiming otherwise? They use the term "Processor Ordering", but as far as I know that's a term invented at Intel and it's the description that follow that counts.
In fact, they now are just kind of explicit that the ordering you get is what you get with a system that has a store buffer with store-to-load forwarding, but otherwise issues requests in-order:
In any case, I'm not sure what we're arguing about. I think my claim is just that there was a time when people like to use the term "... with causal consistency" when describing the Intel memory model, and that Intel themselves included causal consistency in their memory model rules (and also terms like "transitive visibility".
> If they ever used a shortened term (I am not sure they did) then term was "Processor
> Consistency", to emphasize the fact that stores issued by a single processor
> are always observed in the program order by all other processors.
> In fact, I am not sure if special mentioning of causality is at all necessary in
> mathematical sense. It's well possible that causality follows from other rules.
> In contrast, it seems that mentioning causality is necessary for something
> like PPC or IPF, where it obviously does not follow from other rules.
Yeah they did and do, including in today's manual (per above).
> That's because it isn't written by mathematicians. And,
> more importantly, it isn't written for mathematicians
I think if you are trying to do something that depends on an understanding of the memory model of some CPU architecture (and you are a native English speaker) you better damn well understand works like "causality" and "transitive", which in any case aren't terribly more esoteric than "dependent".
> I'd guess, the real reason is that the authors just copied a paragraph without giving
> it too much thought, from one of the earlier manuals where they either intentionally
> (most likely) or by mistake omitted the rule about "loads don't pass loads".
> It does sound possible that at some point around 2003-2005 Intel had a plan for building x86 processor
> in which loads do pass previous loads or, at very least, didn't knew what exactly they are going to
> build. But even at that point they were sure that they are not going to abandon
causality.
Yes, agree in general. They left the door open on a weaker model for a while and we only got to where we are today through successive refinements of the docs and at some point they established the existing strong model as a rule.
About the example, it seems like any type of load-load reordering would break it, so I think what they were leaving to door opening was the rule they have today:
> Any two stores are seen in a consistent order by processors other than those performing the stores
I.e., that there is a total store order. This rule actually isn't implied by simply defining the possible re-orderings: even if stores couldn't be reordered with other stores, it doesn't mean that a stream of independent stores from two processors A and B would be seen to have the same interleaving as seen by two other processors C and D.
That's an often glossed over aspect of the memory model (for example, Wikipedia doesn't even include it in their chart) and is why Java, for example, has special additional constraints when it comes to volatiles: in additional to describing the "happens-before" relationships, they have another rule, which isn't implied by the others about all volatile stores having a single observable total order.
Intel's earliest rules didn't have the total order (except for local stores) requirement, but they threw causal consistency in as a bone to programmers. In particular, the differing orders is most likely to occur in multi-node or multi-socket systems where each node has faster intra-forwarding than iter-node - essentially the same "problem" that store-forwarding causes, but on a "node" level. In that case, you don't have total store order, but the write from any given CPU is still seen in a consistent order by all other CPUs, but the interleaving of writes from different CPUs may not be.
Causality is a natural behavior of many actual implementations of such systems, since it basically says "If agent A sees a write made by B, and B in turn saw a write by C before making its write, A will also the write by C". Try placing A, B and C in any combination of local or remote nodes and you'll see that it just falls out mostly naturally out of a system where you have a globally coherent memory order among nodes, except that each node can see their own writes earlier in that order (just like a CPU sees its own writes ahead in the global order due to store-buffer forwarding).
You can see all the evidence of this in some rules you still find today, which (as you point out) have likely been copy/pasted across generations of documents, such as:
What! That seems like a very weak model. Basically it seems to allow arbitrary write re-ordering as long as any two writes from a single CPU are not reordered.
Later on they kind of go back on that:
So the potential reordering only applies as long as some of the involved stores came from the same CPU making the observations. OK. That's TSO and that's final strengthening that Intel added. This rule does not appear in the 2007 Intel white paper. Instead you find this other rule:
A much weaker form of store ordering which only applies to a single location. So that plus causality seems to have been their first stab at a formal model, but later they strengthened it.
So Intel was very much pushing a causally consistent model about 10 years ago.
> > To be clear, I mean they talk about causal consistency in that white paper, and there
> > is a rule explicitly covering it, which apparently stands alone from the other rules.
>
> Apparently?
To be clear, I mean that each rule implies something not implied by the other rules, i.e., each rule additionally restricts the allowable behavior.
So what I meant by "stands alone" is that one rule mentioning causality is enough for causality to be an interesting part of the memory model. It doesn't need to be mentioned to every rule.
That said, I think the current inclusion of the rule is redundant - it is already implied by the other rules (in particular, the load/load and load/store rules), so mentioning it just adds confusion. Many versions of the document back, this rule may have been added at a time where the defined model was weaker (as you discuss below) so it may have actually added something then.
>
> > I'm not sure where the disagreement is: Intel used a "causal consistency" rule in the
> > past as this paper clearly shows. For example: "memory ordering obeys causality".
>
> They point is, that for as long as Intel bothered to describe its x86 memory ordering model in
> formal documents (I think, it happened for a first time in IPF docs, in a part talking about functionality
> of x86 hardware of McKinley and Madison. Could be wrong about it) they never called it "causal
> consistency", because it is obviously much stronger than just a causal consistency.
They don't seem to use a standard moniker to define their processor ordering. I don't think I was claiming otherwise? They use the term "Processor Ordering", but as far as I know that's a term invented at Intel and it's the description that follow that counts.
In fact, they now are just kind of explicit that the ordering you get is what you get with a system that has a store buffer with store-to-load forwarding, but otherwise issues requests in-order:
P6 family processors also use a processor-ordered memory-ordering model that can be further defined as “write ordered with store-buffer forwarding.”
In any case, I'm not sure what we're arguing about. I think my claim is just that there was a time when people like to use the term "... with causal consistency" when describing the Intel memory model, and that Intel themselves included causal consistency in their memory model rules (and also terms like "transitive visibility".
> If they ever used a shortened term (I am not sure they did) then term was "Processor
> Consistency", to emphasize the fact that stores issued by a single processor
> are always observed in the program order by all other processors.
> In fact, I am not sure if special mentioning of causality is at all necessary in
> mathematical sense. It's well possible that causality follows from other rules.
> In contrast, it seems that mentioning causality is necessary for something
> like PPC or IPF, where it obviously does not follow from other rules.
Yeah they did and do, including in today's manual (per above).
> That's because it isn't written by mathematicians. And,
> more importantly, it isn't written for mathematicians
I think if you are trying to do something that depends on an understanding of the memory model of some CPU architecture (and you are a native English speaker) you better damn well understand works like "causality" and "transitive", which in any case aren't terribly more esoteric than "dependent".
> I'd guess, the real reason is that the authors just copied a paragraph without giving
> it too much thought, from one of the earlier manuals where they either intentionally
> (most likely) or by mistake omitted the rule about "loads don't pass loads".
> It does sound possible that at some point around 2003-2005 Intel had a plan for building x86 processor
> in which loads do pass previous loads or, at very least, didn't knew what exactly they are going to
> build. But even at that point they were sure that they are not going to abandon
causality.
Yes, agree in general. They left the door open on a weaker model for a while and we only got to where we are today through successive refinements of the docs and at some point they established the existing strong model as a rule.
About the example, it seems like any type of load-load reordering would break it, so I think what they were leaving to door opening was the rule they have today:
> Any two stores are seen in a consistent order by processors other than those performing the stores
I.e., that there is a total store order. This rule actually isn't implied by simply defining the possible re-orderings: even if stores couldn't be reordered with other stores, it doesn't mean that a stream of independent stores from two processors A and B would be seen to have the same interleaving as seen by two other processors C and D.
That's an often glossed over aspect of the memory model (for example, Wikipedia doesn't even include it in their chart) and is why Java, for example, has special additional constraints when it comes to volatiles: in additional to describing the "happens-before" relationships, they have another rule, which isn't implied by the others about all volatile stores having a single observable total order.
Intel's earliest rules didn't have the total order (except for local stores) requirement, but they threw causal consistency in as a bone to programmers. In particular, the differing orders is most likely to occur in multi-node or multi-socket systems where each node has faster intra-forwarding than iter-node - essentially the same "problem" that store-forwarding causes, but on a "node" level. In that case, you don't have total store order, but the write from any given CPU is still seen in a consistent order by all other CPUs, but the interleaving of writes from different CPUs may not be.
Causality is a natural behavior of many actual implementations of such systems, since it basically says "If agent A sees a write made by B, and B in turn saw a write by C before making its write, A will also the write by C". Try placing A, B and C in any combination of local or remote nodes and you'll see that it just falls out mostly naturally out of a system where you have a globally coherent memory order among nodes, except that each node can see their own writes earlier in that order (just like a CPU sees its own writes ahead in the global order due to store-buffer forwarding).
You can see all the evidence of this in some rules you still find today, which (as you point out) have likely been copy/pasted across generations of documents, such as:
- Writes by a single processor are observed in the same order by all processors.
- Writes from an individual processor are NOT ordered with respect to the writes from other processors.
What! That seems like a very weak model. Basically it seems to allow arbitrary write re-ordering as long as any two writes from a single CPU are not reordered.
Later on they kind of go back on that:
Any two stores are seen in a consistent order by processors other than those performing the stores
So the potential reordering only applies as long as some of the involved stores came from the same CPU making the observations. OK. That's TSO and that's final strengthening that Intel added. This rule does not appear in the 2007 Intel white paper. Instead you find this other rule:
Total order on stores to the same location
A much weaker form of store ordering which only applies to a single location. So that plus causality seems to have been their first stab at a formal model, but later they strengthened it.
So Intel was very much pushing a causally consistent model about 10 years ago.