By: anon (anon.delete@this.anon.com), August 21, 2014 10:17 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on August 21, 2014 12:31 pm wrote:
> nksingh (none.delete@this.none.non) on August 21, 2014 11:34 am wrote:
> > > According to my understanding, the cheapest practical way to get effect of membar in
> > > x86 WB memory region would be reading from the address of last write. Or, if you want
> > > barrier after read (I never want, but I am not a lockless guy), writing to some dummy
> > > locations that you are likely to own and then reading that location back.
> >
> > From my interpretation of the x86 memory model and your statement above, I think you won't
> > get the ordering you desire. There's a squirrely exception in the x86 memory order model
> > for store-buffer forwarding. In the version of the Software Dev Manual I have on hand, this
> > behavior is spelled out in a section called "Intra-Processor Forwarding Is Allowed."
> >
>
> I will get ordering I desire, because I do not desire sequential consistency.
> All I want is to get as strong effect as architecturally guaranteed by MFENCE instruction.
>
> Pay attention that according to MEMORY ORDERING section of "Intel 64 and IA-32 Architectures
> Software Developer’s Manual" MFENCE does not help total order at all.
> Locked instructions appear to be the only documented way to achieve it.
[X] := 1
r1 := [Y]
vs
[Y] := 1
r2 := [X]
With a global sequential ordering, the condition (r1 != 0 or r2 != 0) holds. It does not hold for x86, due to reordering loads before stores. x86 with barriers:
[X] := 1
mfence
r1 := [Y]
vs
[Y] := 1
mfence
r2 := [X]
Then the condition holds. However, if I read your idea correctly:
[X] := 1
r8 := [X]
r1 := [Y]
vs
[Y] := 1
r9 := [Y]
r2 := [X]
I'm fairly sure this does NOT make the condition hold, exactly due to the store forwarding exception.
If you are reading it as, "loads have to be in-order, therefore the 2nd load must be executed after the first, therefore the 2nd load must be executed after the store," then I can understand where you get the idea. However store forwarding exception is saying that loads can be satisified from a location before stores to that location become visible to other CPUs (it's actually more an exception to cache coherency more than memory consistency). The first load can be executed before the store becomes visible to other CPUs -- I cannot see any rule that says the second load can not also be executed before that store becomes visible.
> nksingh (none.delete@this.none.non) on August 21, 2014 11:34 am wrote:
> > > According to my understanding, the cheapest practical way to get effect of membar in
> > > x86 WB memory region would be reading from the address of last write. Or, if you want
> > > barrier after read (I never want, but I am not a lockless guy), writing to some dummy
> > > locations that you are likely to own and then reading that location back.
> >
> > From my interpretation of the x86 memory model and your statement above, I think you won't
> > get the ordering you desire. There's a squirrely exception in the x86 memory order model
> > for store-buffer forwarding. In the version of the Software Dev Manual I have on hand, this
> > behavior is spelled out in a section called "Intra-Processor Forwarding Is Allowed."
> >
>
> I will get ordering I desire, because I do not desire sequential consistency.
> All I want is to get as strong effect as architecturally guaranteed by MFENCE instruction.
>
> Pay attention that according to MEMORY ORDERING section of "Intel 64 and IA-32 Architectures
> Software Developer’s Manual" MFENCE does not help total order at all.
> Locked instructions appear to be the only documented way to achieve it.
[X] := 1
r1 := [Y]
vs
[Y] := 1
r2 := [X]
With a global sequential ordering, the condition (r1 != 0 or r2 != 0) holds. It does not hold for x86, due to reordering loads before stores. x86 with barriers:
[X] := 1
mfence
r1 := [Y]
vs
[Y] := 1
mfence
r2 := [X]
Then the condition holds. However, if I read your idea correctly:
[X] := 1
r8 := [X]
r1 := [Y]
vs
[Y] := 1
r9 := [Y]
r2 := [X]
I'm fairly sure this does NOT make the condition hold, exactly due to the store forwarding exception.
If you are reading it as, "loads have to be in-order, therefore the 2nd load must be executed after the first, therefore the 2nd load must be executed after the store," then I can understand where you get the idea. However store forwarding exception is saying that loads can be satisified from a location before stores to that location become visible to other CPUs (it's actually more an exception to cache coherency more than memory consistency). The first load can be executed before the store becomes visible to other CPUs -- I cannot see any rule that says the second load can not also be executed before that store becomes visible.