By: Maynard Handley (name99.delete@this.name99.org), July 12, 2015 12:32 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 12, 2015 11:24 am wrote:
> anon (anon.delete@this.anon.com) on July 12, 2015 3:42 am wrote:
> >
> > So all of that happens inside the core. Memory ordering instructions
> > have to prevent these reorderings within the core.
>
> No, they really don't.
>
> Look, let's make this really simple. You have this code:
>
> - store A to location X
> - load B from location Y
>
> and you want to move the load earlier, because loads matter from
> a performance standpoint, and stores don't and can be buffered.
...
> Quite frankly, that "phase 1" argument was valid 20 years ago. Today it really is "paste eater" level crap.
Linus, this seems like a reasonable argument BUT there are at least two ISAs that have been designed since 20 years ago: ARMv8 and RISC-V. Both have been designed well after the period that we know about what you call Phase 2 --- the (feasible) cost of implementation, and the quality of predictions. And yet both of them went with weak memory models.
I understand the temptation to rail against the stupidity of the world --- god knows I frequently engage in it. But there comes a point where you have to ask yourself: "is it really feasible that I know something that not just one, not just two, but a few hundred people who have been doing this stuff for a long time don't know? People who have been quite willing to drop the Alpha/RISC orthodoxy on some other matters, like aligned loads/stores, or SW-managed TLBs, in the face of overwhelming evidence of their problematic nature."
I can offer one possible example. You leave the compiler out your discussion, but it is often the case that you have to indicate to the compiler (not just the HW) about memory re-ordering. So it's reasonable at that point to say "since this information has to be in the program, anyway, if you want correctness in the face of modern compilers, so why not propagate it down to the hardware, and we can perhaps usefully use it there?"
Now how can it usefully be used even if you are performing speculative load-hoisting? I don't know; but then I am no expert (hardly even much of an amateur) in this area.
> anon (anon.delete@this.anon.com) on July 12, 2015 3:42 am wrote:
> >
> > So all of that happens inside the core. Memory ordering instructions
> > have to prevent these reorderings within the core.
>
> No, they really don't.
>
> Look, let's make this really simple. You have this code:
>
> - store A to location X
> - load B from location Y
>
> and you want to move the load earlier, because loads matter from
> a performance standpoint, and stores don't and can be buffered.
...
> Quite frankly, that "phase 1" argument was valid 20 years ago. Today it really is "paste eater" level crap.
Linus, this seems like a reasonable argument BUT there are at least two ISAs that have been designed since 20 years ago: ARMv8 and RISC-V. Both have been designed well after the period that we know about what you call Phase 2 --- the (feasible) cost of implementation, and the quality of predictions. And yet both of them went with weak memory models.
I understand the temptation to rail against the stupidity of the world --- god knows I frequently engage in it. But there comes a point where you have to ask yourself: "is it really feasible that I know something that not just one, not just two, but a few hundred people who have been doing this stuff for a long time don't know? People who have been quite willing to drop the Alpha/RISC orthodoxy on some other matters, like aligned loads/stores, or SW-managed TLBs, in the face of overwhelming evidence of their problematic nature."
I can offer one possible example. You leave the compiler out your discussion, but it is often the case that you have to indicate to the compiler (not just the HW) about memory re-ordering. So it's reasonable at that point to say "since this information has to be in the program, anyway, if you want correctness in the face of modern compilers, so why not propagate it down to the hardware, and we can perhaps usefully use it there?"
Now how can it usefully be used even if you are performing speculative load-hoisting? I don't know; but then I am no expert (hardly even much of an amateur) in this area.