By: anon (anon.delete@this.anon.com), July 6, 2015 9:02 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 6, 2015 5:25 pm wrote:
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 6, 2015 4:35 pm wrote:
> >
> > Wrong - of course it does matter! If there is just a control
> > dependency then the compiler may reorder the loads,
> > lift them, CSE them etc (it can't if there is a data dependency).
>
> You really don't get it, do you?
>
> We're talking about binaries being dynamically optimized by the CPU. There may well have been
> various compiler barriers etc in the source code. There won't be any in the binary, because the
> data dependency is sufficient. In fact, the compiler may well have been very aware of it (ie the
> proposed "atomic_load(..., mo_consume)" kind of ordering), and explicitly not having added any
> memory barriers exactly because the conditional move was a sufficient data depedency.
>
> The point is, dynamically turning a conditional move into a predicted move is actually
> somewhat subtle on both ARM and Power, because of the odd memory ordering semantics.
>
> But rabidly defending bad ARM designs whether you understand the problem or not - why am I not surprised?
You have no point, and you are doing exactly what you accuse Wilco of, which is rabidly bashing anything that's not x86 without facts on your side.
You bring this up so frequently on here that I can't be sure you're not simply trolling by now. But here goes for the 100th time: "A weaker architectural memory consistency model does not constrain microarchitectural implementation." Now write that out 100 times on a chalkboard.
Weaker ordering always means less constraint on hardware side! [Actually not *strictly* true if the hardware has to deal with more barrier instructions in the instruction stream, but as a percentage of overall execution, they will be quite small, and if they did happen to be a problem they could easily be squashed early in the front end on a modern core that chose to implement stronger memory ordering. But practically true.]
Yes, that also goes for your nonsensical claims that memory barrier instructions are so costly to implement that weakly ordered ISAs are at a performance disadvantage to x86.
Weaker ordering puts more burden on the software side, and stronger ordering places more constraints on the hardware. Simple as that. Whether those hardware constraints prevent more optimal implementations is not something I'm asserting either way here, so don't argue against that strawman.
On the software side, I'm sure you'll like bring up how much of a catastrophe weaker ordering is, but again that's not my point.
> Wilco (Wilco.Dijkstra.delete@this.ntlworld.com) on July 6, 2015 4:35 pm wrote:
> >
> > Wrong - of course it does matter! If there is just a control
> > dependency then the compiler may reorder the loads,
> > lift them, CSE them etc (it can't if there is a data dependency).
>
> You really don't get it, do you?
>
> We're talking about binaries being dynamically optimized by the CPU. There may well have been
> various compiler barriers etc in the source code. There won't be any in the binary, because the
> data dependency is sufficient. In fact, the compiler may well have been very aware of it (ie the
> proposed "atomic_load(..., mo_consume)" kind of ordering), and explicitly not having added any
> memory barriers exactly because the conditional move was a sufficient data depedency.
>
> The point is, dynamically turning a conditional move into a predicted move is actually
> somewhat subtle on both ARM and Power, because of the odd memory ordering semantics.
>
> But rabidly defending bad ARM designs whether you understand the problem or not - why am I not surprised?
You have no point, and you are doing exactly what you accuse Wilco of, which is rabidly bashing anything that's not x86 without facts on your side.
You bring this up so frequently on here that I can't be sure you're not simply trolling by now. But here goes for the 100th time: "A weaker architectural memory consistency model does not constrain microarchitectural implementation." Now write that out 100 times on a chalkboard.
Weaker ordering always means less constraint on hardware side! [Actually not *strictly* true if the hardware has to deal with more barrier instructions in the instruction stream, but as a percentage of overall execution, they will be quite small, and if they did happen to be a problem they could easily be squashed early in the front end on a modern core that chose to implement stronger memory ordering. But practically true.]
Yes, that also goes for your nonsensical claims that memory barrier instructions are so costly to implement that weakly ordered ISAs are at a performance disadvantage to x86.
Weaker ordering puts more burden on the software side, and stronger ordering places more constraints on the hardware. Simple as that. Whether those hardware constraints prevent more optimal implementations is not something I'm asserting either way here, so don't argue against that strawman.
On the software side, I'm sure you'll like bring up how much of a catastrophe weaker ordering is, but again that's not my point.