By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 12, 2015 1:38 pm
Room: Moderated Discussions
Maynard Handley (name99.delete@this.name99.org) on July 12, 2015 12:32 pm wrote:
>
> Linus, this seems like a reasonable argument BUT there are at least two ISAs that have been
> designed since 20 years ago: ARMv8 and RISC-V.
So I'm going to dismiss RISC-V as "academic", and I think they ended up with the weak ordering because it's "pretty". I think it's a very easy trap to fall into, especially if one of the major goals of your architecture is mainly academia ("research and education").
I think weak memory ordering is easier to just explain to people conceptually. Never mind that it may be harder to use. It's pretty simple to say "loads and stores are not ordered", and then (because that leads to problems) introduce the notion of acquire and release on top of that in terms of locking primitive rules.
In contrast, the x86 model of "all loads are acquires and all stores are releases" sounds odd. It doesn't sound symmetric or clean, even if that particular wording is probably the most symmetric and clean wording you can find for it.
As to ARMv8, I obviously think it was a mistake. But it's a very natural one to make when you come from the ARM background. I don't think ARMv8 is really a "new" model, it's just ARM with completely different encoding, and an updated baseline from ARM.
I wonder if we could make the x86 model more palatable to academic people by pointing out how broken the traditional memory barriers are (really, acquire/release is so much better than the "read/write/full" memory barrier crap, and even academics understand that), and then really describe x86 just in terms of being the most beautiful model because everything is acquire/release.
IOW, do some kind of mental judo on people who seem to like weak memory ordering just because of the symmetry and the "simplicity" of the model. ;)
> I can offer one possible example. You leave the compiler out your discussion, but it is often the case that
> you have to indicate to the compiler (not just the HW) about memory re-ordering. So it's reasonable at that
> point to say "since this information has to be in the program, anyway, if you want correctness in the face of
> modern compilers, so why not propagate it down to the hardware, and we can perhaps usefully use it there?"
> Now how can it usefully be used even if you are performing speculative load-hoisting?
> I don't know; but then I am no expert (hardly even much of an amateur) in this area.
So I realize that people use that as an argument, and I think it's a very sad and bad argument.
Because it's basically saying "well, hardware does shit things too, so let's make compilers do even more crap things, and then use that as an argument for hardware doing crap in the first place".
A mentally damaged sloth on drugs can see that the above is a circular argument and a logical fallacy.
But apparently those mentally damaged dug-infused sloths are smarter than many people in the tech world. I see your argument much too often, and it makes me sad and angry.
I've been involved in the whole "atomic access" discussion for the C standard (happily only as a distant person, not intimately) and it's insane. The standards language people are trying to introduce for "mo_consume" (which is basically "acquire, but not on ARM and Powerpc") is a mess. It's sad, and it's all because the C language standards people are trying to bend over backwards for the crap memory ordering of bad CPU's.
It should be just "acquire", but since that's expensive on ARM and PowerPC, people really want to use regular loads, and depend on the data dependency consistency that everybody but alpha has. And it turns out that just describing the data dependency consistency is horrible and crazy, and I can currently guarantee that
(a) no compiler writer will ever get it right or understand it
(b) no actual user will ever really understand it anyway, and will then use it without understanding the rules.
simply because the C standards language as currently proposed is fragile and unreadable.
Do you really think that is a good situation?
And the argument people use for this? It's that the language should cater to shit CPU semantics. And then the next step is exactly the one you're using: that since the compiler already breaks things, why should the hardware then work any better?
I seriously believe that the C standards committee is doing the wrong thing. They've done it for a long time. Re-ordering memory accesses by the compiler is not valid. Doing speculative writes to memory locations is not valid. It really should be that simple. Sure, the compiler can reorder things that it can prove are not semantically visible, but that's the only reordering the compiler should do.
And yes, the C model of "strict aliasing" is broken shit. And yes, it actually does break real code. Most of the time the damage is limited by just the fact that re-ordering by the compiler is usually limited by other things (like function calls), and by the fact that a lot of real-world code isn't actually compiled with insane optimizations.
Rant over. But the short answer is that yes, I've been involved at the compiler side too, and I've ranted on that side too.
Linus
>
> Linus, this seems like a reasonable argument BUT there are at least two ISAs that have been
> designed since 20 years ago: ARMv8 and RISC-V.
So I'm going to dismiss RISC-V as "academic", and I think they ended up with the weak ordering because it's "pretty". I think it's a very easy trap to fall into, especially if one of the major goals of your architecture is mainly academia ("research and education").
I think weak memory ordering is easier to just explain to people conceptually. Never mind that it may be harder to use. It's pretty simple to say "loads and stores are not ordered", and then (because that leads to problems) introduce the notion of acquire and release on top of that in terms of locking primitive rules.
In contrast, the x86 model of "all loads are acquires and all stores are releases" sounds odd. It doesn't sound symmetric or clean, even if that particular wording is probably the most symmetric and clean wording you can find for it.
As to ARMv8, I obviously think it was a mistake. But it's a very natural one to make when you come from the ARM background. I don't think ARMv8 is really a "new" model, it's just ARM with completely different encoding, and an updated baseline from ARM.
I wonder if we could make the x86 model more palatable to academic people by pointing out how broken the traditional memory barriers are (really, acquire/release is so much better than the "read/write/full" memory barrier crap, and even academics understand that), and then really describe x86 just in terms of being the most beautiful model because everything is acquire/release.
IOW, do some kind of mental judo on people who seem to like weak memory ordering just because of the symmetry and the "simplicity" of the model. ;)
> I can offer one possible example. You leave the compiler out your discussion, but it is often the case that
> you have to indicate to the compiler (not just the HW) about memory re-ordering. So it's reasonable at that
> point to say "since this information has to be in the program, anyway, if you want correctness in the face of
> modern compilers, so why not propagate it down to the hardware, and we can perhaps usefully use it there?"
> Now how can it usefully be used even if you are performing speculative load-hoisting?
> I don't know; but then I am no expert (hardly even much of an amateur) in this area.
So I realize that people use that as an argument, and I think it's a very sad and bad argument.
Because it's basically saying "well, hardware does shit things too, so let's make compilers do even more crap things, and then use that as an argument for hardware doing crap in the first place".
A mentally damaged sloth on drugs can see that the above is a circular argument and a logical fallacy.
But apparently those mentally damaged dug-infused sloths are smarter than many people in the tech world. I see your argument much too often, and it makes me sad and angry.
I've been involved in the whole "atomic access" discussion for the C standard (happily only as a distant person, not intimately) and it's insane. The standards language people are trying to introduce for "mo_consume" (which is basically "acquire, but not on ARM and Powerpc") is a mess. It's sad, and it's all because the C language standards people are trying to bend over backwards for the crap memory ordering of bad CPU's.
It should be just "acquire", but since that's expensive on ARM and PowerPC, people really want to use regular loads, and depend on the data dependency consistency that everybody but alpha has. And it turns out that just describing the data dependency consistency is horrible and crazy, and I can currently guarantee that
(a) no compiler writer will ever get it right or understand it
(b) no actual user will ever really understand it anyway, and will then use it without understanding the rules.
simply because the C standards language as currently proposed is fragile and unreadable.
Do you really think that is a good situation?
And the argument people use for this? It's that the language should cater to shit CPU semantics. And then the next step is exactly the one you're using: that since the compiler already breaks things, why should the hardware then work any better?
I seriously believe that the C standards committee is doing the wrong thing. They've done it for a long time. Re-ordering memory accesses by the compiler is not valid. Doing speculative writes to memory locations is not valid. It really should be that simple. Sure, the compiler can reorder things that it can prove are not semantically visible, but that's the only reordering the compiler should do.
And yes, the C model of "strict aliasing" is broken shit. And yes, it actually does break real code. Most of the time the damage is limited by just the fact that re-ordering by the compiler is usually limited by other things (like function calls), and by the fact that a lot of real-world code isn't actually compiled with insane optimizations.
Rant over. But the short answer is that yes, I've been involved at the compiler side too, and I've ranted on that side too.
Linus