By: EduardoS (no.delete@this.spam.com), July 12, 2015 11:53 am
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 12, 2015 11:24 am wrote:
> So obviously memory barriers are a good idea, and an absolute
> must. It's what synchronizes all those things.
Memory barriers still looks stupid at this point, the problem with barriers is that they are barrier for all loads or stores (often there are instructions for load barrier, store barrier or total barrier), but there is absolutely no need for such barrier, only instructions that access shared data need to be synchronized, and that's not about only the CPU, but about the reordering the compilers do as well, if the compiler knows it can't reorder a variable read (because it is "volatile") it knows it must emit a synchronized load.
> And guess what? That's not so hard. If you did an early load, that means that you had to get the cacheline
> with the load data. Now, how do you figure out whether another store disturbed that data? Sure, you
> still have the same store buffer logic that you used fro UP for the local stores, but you also see
> the remote stores: they'd have to get the cacheline from you. So all your "marker in the memory subsystem"
> has to react to is that the cacheline it marked went away (and maybe the cacheline comes back, but
> that doesn't help - if it went away, it causes the marker to be "invalid").
>
> See? No memory barriers. No nothing. Just that same model of "load early and mark".
As a programmer I think it is easy too, but nobody does it, not even x86.
> So obviously memory barriers are a good idea, and an absolute
> must. It's what synchronizes all those things.
Memory barriers still looks stupid at this point, the problem with barriers is that they are barrier for all loads or stores (often there are instructions for load barrier, store barrier or total barrier), but there is absolutely no need for such barrier, only instructions that access shared data need to be synchronized, and that's not about only the CPU, but about the reordering the compilers do as well, if the compiler knows it can't reorder a variable read (because it is "volatile") it knows it must emit a synchronized load.
> And guess what? That's not so hard. If you did an early load, that means that you had to get the cacheline
> with the load data. Now, how do you figure out whether another store disturbed that data? Sure, you
> still have the same store buffer logic that you used fro UP for the local stores, but you also see
> the remote stores: they'd have to get the cacheline from you. So all your "marker in the memory subsystem"
> has to react to is that the cacheline it marked went away (and maybe the cacheline comes back, but
> that doesn't help - if it went away, it causes the marker to be "invalid").
>
> See? No memory barriers. No nothing. Just that same model of "load early and mark".
As a programmer I think it is easy too, but nobody does it, not even x86.