By: anon (anon.delete@this.anon.com), July 12, 2015 9:48 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on July 12, 2015 7:42 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 12, 2015 11:24 am wrote:
> > anon (anon.delete@this.anon.com) on July 12, 2015 3:42 am wrote:
> > >
> > > So all of that happens inside the core. Memory ordering instructions
> > > have to prevent these reorderings within the core.
> >
> > No, they really don't.
>
> Obviously:
> * I was responding to the claim about hardware in general. Not a particular implementation.
> * I meant prevent the *appearance* of those reorderings.
> * I acknowledged speculative approaches that work to prevent apparent reordering.
>
> I appreciate the time you took to respond though.
>
> > So just look at that example of "do a load early" model: just do the load early, you marked
> > it somewhere in the memory subsystem, and you added it to your memory access retirement queue.
> > Now you just need to figure out if anybody did a store that invalidated the load.
> >
> > And guess what? That's not so hard. If you did an early load, that means that you had to get the cacheline
> > with the load data. Now, how do you figure out whether another store disturbed that data? Sure, you
> > still have the same store buffer logic that you used fro UP for the local stores, but you also see
> > the remote stores: they'd have to get the cacheline from you. So all your "marker in the memory subsystem"
> > has to react to is that the cacheline it marked went away (and maybe the cacheline comes back, but
> > that doesn't help - if it went away, it causes the marker to be "invalid").
> >
> > See? No memory barriers. No nothing. Just that same model of "load early and mark".
>
> This doesn't invalidate my comment as I explained above, but I'd
> like to respond to it because this topic is of interest to me.
>
> You're talking about memory operations to a single address, and as such, it has nothing
> to do with the memory ordering problem. Fine, speculatively load an address and check
> for invalidation, but that does not help the memory ordering problem.
>
> The problem (and the reason why x86 explicitly allows and actually does reorder in practice) is load vs older
> store. You have moved your load before the store executes, and you have all the mechanism in place to ensure
> that load is valid. How do you determine if *another* CPU has loaded the location of your store within that
> reordered window? Eh? And no, you can't check that the store cacheline remains exclusive on your CPU because
> you may not even own it exclusive let alone know the address of it at the time you performed your load.
>
> Example, all memory set to 0:
>
Uh, this got messed up with html parsing of course. It was just the canonical load/store reordering example (CPU0 store x, load y; CPU1 store y, load x;).
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 12, 2015 11:24 am wrote:
> > anon (anon.delete@this.anon.com) on July 12, 2015 3:42 am wrote:
> > >
> > > So all of that happens inside the core. Memory ordering instructions
> > > have to prevent these reorderings within the core.
> >
> > No, they really don't.
>
> Obviously:
> * I was responding to the claim about hardware in general. Not a particular implementation.
> * I meant prevent the *appearance* of those reorderings.
> * I acknowledged speculative approaches that work to prevent apparent reordering.
>
> I appreciate the time you took to respond though.
>
> > So just look at that example of "do a load early" model: just do the load early, you marked
> > it somewhere in the memory subsystem, and you added it to your memory access retirement queue.
> > Now you just need to figure out if anybody did a store that invalidated the load.
> >
> > And guess what? That's not so hard. If you did an early load, that means that you had to get the cacheline
> > with the load data. Now, how do you figure out whether another store disturbed that data? Sure, you
> > still have the same store buffer logic that you used fro UP for the local stores, but you also see
> > the remote stores: they'd have to get the cacheline from you. So all your "marker in the memory subsystem"
> > has to react to is that the cacheline it marked went away (and maybe the cacheline comes back, but
> > that doesn't help - if it went away, it causes the marker to be "invalid").
> >
> > See? No memory barriers. No nothing. Just that same model of "load early and mark".
>
> This doesn't invalidate my comment as I explained above, but I'd
> like to respond to it because this topic is of interest to me.
>
> You're talking about memory operations to a single address, and as such, it has nothing
> to do with the memory ordering problem. Fine, speculatively load an address and check
> for invalidation, but that does not help the memory ordering problem.
>
> The problem (and the reason why x86 explicitly allows and actually does reorder in practice) is load vs older
> store. You have moved your load before the store executes, and you have all the mechanism in place to ensure
> that load is valid. How do you determine if *another* CPU has loaded the location of your store within that
> reordered window? Eh? And no, you can't check that the store cacheline remains exclusive on your CPU because
> you may not even own it exclusive let alone know the address of it at the time you performed your load.
>
> Example, all memory set to 0:
>
Uh, this got messed up with html parsing of course. It was just the canonical load/store reordering example (CPU0 store x, load y; CPU1 store y, load x;).