By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 8, 2015 1:38 pm
Room: Moderated Discussions
dmcq (dmcq.delete@this.fano.co.uk) on July 8, 2015 10:11 am wrote:
>
> If Linux copes with weak memory ordering and just using barriers then that is good. If a person wants to
> use shared data they should signal to the hardware that they are doing that - it should not be a general
> overhead.
You completely missed my argument.Apparently the whole "testing is meaningful" passed you by with no impact.
> The design of the hardware is becoming more automated and straightforward rules are what are needed.
.. I would say that _stricter_ rules are needed. We have a history of CPU's starting off with really badly designed and weak rules in general, and every single time that tends to be a big mistake. Undefined behavior in a CPU is bad. And weak memory ordering pretty much ends up being "undefined behavior".
Now, would I like memory ordering even stricter than what Intel gives me? Yeah, that would be lovely. I think s390 ends up actually doing that. But I'll take what I can get, and I'll point out the relative strengths and weaknesses. The x86 is a relatively strong model that helps make some orderign issues much easier to handle.
(In particular, causality is something that people don't think about, because we take it for granted. When your memory ordering breaks causality, I guarantee you that people can look at code that is "obviously correct" and have a really hard time debugging it)
> You're
> probably well aware of the problems that crop up because the compilers are getting more intelligent and
> follow the specs so they can produce better optimised code.
I wouldn't call that "intelligent". Not if it causes problems and depends on undefined behavior to do questionable optimizations. And yes, the C standards people are sadly giving compiler people more room to screw up (I particularly dislike the insane type-based aliasing).
For example, only a moron would decide that two accesses to the same variable cannot alias. But I have seen compilers that do exactly that, because they followed the standard and said that if you access it with two different types, the accesses cannot alias. Literally breaking obvious and traditional C code.
You apparently call things like that "intelligent". I call it stupid in the extreme. It's not a feature. It's idiocy. The fact that the C language definition allows it does not make the language definition "better". It makes it worse.
Same deal with CPU definitions.
> That is happening to processors. As they get
> more intelligent in following the specs we need proper markers on access to shared data.
No we don't. We need processors that are more robust, and don't do stupid things. We've got enough bugs to worry about that we don't need another source of really subtle ones.
And again, your definition of "intelligent" is very odd. Quite frankly, the Intel big0-core CPU's are clearly more intelligent than the ARM or Power cores.
If you really doubt that, go ask a Power person to explain their serialization logic. I can pretty much guarantee that that person will fail. Because the Power barriers arent' "intelligent". They are the exact opposite of that. They are confusing even to people who understand Power, much less anybody else. I guarantee you that even compiler people are afraid of them. They are the opposite of intelligent.
ARM is better, in at least having much clearer rules. Those rules are still much more fragile for users, and there are several very odd corner cases (the whole "data dependency" vs "control dependency" really is a pretty subtle rule that most people are not at all aware of - I can almost guarantee that most people on this forum had no idea).
Intel ordering rules are certainly not simple either (although they have improved a lot), so I'm by no means claiming they are perfect. But they are at least much less prone to nasty hidden races that expose bugs on new microarchitectures.
Linus
>
> If Linux copes with weak memory ordering and just using barriers then that is good. If a person wants to
> use shared data they should signal to the hardware that they are doing that - it should not be a general
> overhead.
You completely missed my argument.Apparently the whole "testing is meaningful" passed you by with no impact.
> The design of the hardware is becoming more automated and straightforward rules are what are needed.
.. I would say that _stricter_ rules are needed. We have a history of CPU's starting off with really badly designed and weak rules in general, and every single time that tends to be a big mistake. Undefined behavior in a CPU is bad. And weak memory ordering pretty much ends up being "undefined behavior".
Now, would I like memory ordering even stricter than what Intel gives me? Yeah, that would be lovely. I think s390 ends up actually doing that. But I'll take what I can get, and I'll point out the relative strengths and weaknesses. The x86 is a relatively strong model that helps make some orderign issues much easier to handle.
(In particular, causality is something that people don't think about, because we take it for granted. When your memory ordering breaks causality, I guarantee you that people can look at code that is "obviously correct" and have a really hard time debugging it)
> You're
> probably well aware of the problems that crop up because the compilers are getting more intelligent and
> follow the specs so they can produce better optimised code.
I wouldn't call that "intelligent". Not if it causes problems and depends on undefined behavior to do questionable optimizations. And yes, the C standards people are sadly giving compiler people more room to screw up (I particularly dislike the insane type-based aliasing).
For example, only a moron would decide that two accesses to the same variable cannot alias. But I have seen compilers that do exactly that, because they followed the standard and said that if you access it with two different types, the accesses cannot alias. Literally breaking obvious and traditional C code.
You apparently call things like that "intelligent". I call it stupid in the extreme. It's not a feature. It's idiocy. The fact that the C language definition allows it does not make the language definition "better". It makes it worse.
Same deal with CPU definitions.
> That is happening to processors. As they get
> more intelligent in following the specs we need proper markers on access to shared data.
No we don't. We need processors that are more robust, and don't do stupid things. We've got enough bugs to worry about that we don't need another source of really subtle ones.
And again, your definition of "intelligent" is very odd. Quite frankly, the Intel big0-core CPU's are clearly more intelligent than the ARM or Power cores.
If you really doubt that, go ask a Power person to explain their serialization logic. I can pretty much guarantee that that person will fail. Because the Power barriers arent' "intelligent". They are the exact opposite of that. They are confusing even to people who understand Power, much less anybody else. I guarantee you that even compiler people are afraid of them. They are the opposite of intelligent.
ARM is better, in at least having much clearer rules. Those rules are still much more fragile for users, and there are several very odd corner cases (the whole "data dependency" vs "control dependency" really is a pretty subtle rule that most people are not at all aware of - I can almost guarantee that most people on this forum had no idea).
Intel ordering rules are certainly not simple either (although they have improved a lot), so I'm by no means claiming they are perfect. But they are at least much less prone to nasty hidden races that expose bugs on new microarchitectures.
Linus