By: Michael S (already5chosen.delete@this.yahoo.com), August 22, 2014 3:16 am
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on August 21, 2014 11:17 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on August 21, 2014 12:31 pm wrote:
> > nksingh (none.delete@this.none.non) on August 21, 2014 11:34 am wrote:
> > > > According to my understanding, the cheapest practical way to get effect of membar in
> > > > x86 WB memory region would be reading from the address of last write. Or, if you want
> > > > barrier after read (I never want, but I am not a lockless guy), writing to some dummy
> > > > locations that you are likely to own and then reading that location back.
> > >
> > > From my interpretation of the x86 memory model and your statement above, I think you won't
> > > get the ordering you desire. There's a squirrely exception in the x86 memory order model
> > > for store-buffer forwarding. In the version of the Software Dev Manual I have on hand, this
> > > behavior is spelled out in a section called "Intra-Processor Forwarding Is Allowed."
> > >
> >
> > I will get ordering I desire, because I do not desire sequential consistency.
> > All I want is to get as strong effect as architecturally guaranteed by MFENCE instruction.
> >
> > Pay attention that according to MEMORY ORDERING section of "Intel 64 and IA-32 Architectures
> > Software Developer’s Manual" MFENCE does not help total order at all.
> > Locked instructions appear to be the only documented way to achieve it.
>
> [X] := 1
> r1 := [Y]
>
> vs
>
> [Y] := 1
> r2 := [X]
>
> With a global sequential ordering, the condition (r1 != 0 or r2 != 0) holds. It does
> not hold for x86, due to reordering loads before stores. x86 with barriers:
>
> [X] := 1
> mfence
> r1 := [Y]
>
> vs
>
> [Y] := 1
> mfence
> r2 := [X]
>
> Then the condition holds.
I don't think so. According to my understanding of the rules, condition does not hold.
IMHO, you incorrectly interpret the rule that says "Reads cannot pass earlier LFENCE and MFENCE instructions". According to my understanding of this rule, it is strictly local and has no global effects.
Looking at it from perspective of what is happening in hardware, I claim that mfence allowed to drain local store queue, but does not obliged to drain it. So, despite fences, writes to [X] and [Y] can still be in their respective store queues while reads are served from their respective local caches.
> However, if I read your idea correctly:
>
>
> [X] := 1
> r8 := [X]
> r1 := [Y]
>
> vs
>
> [Y] := 1
> r9 := [Y]
> r2 := [X]
>
> I'm fairly sure this does NOT make the condition hold, exactly due to the store forwarding exception.
>
> If you are reading it as, "loads have to be in-order, therefore the 2nd load must be executed after the first,
> therefore the 2nd load must be executed after the store," then I can understand where you get the idea.
Yes, that's the source.
> However
> store forwarding exception is saying that loads can be satisified from a location before stores to that location
> become visible to other CPUs (it's actually more an exception to cache coherency more than memory consistency).
> The first load can be executed before the store becomes visible to other CPUs -- I cannot see any rule that
> says the second load can not also be executed before that store becomes visible.
>
But mfence is no better. Only LOCK helps with total ordering over WB region.
> Michael S (already5chosen.delete@this.yahoo.com) on August 21, 2014 12:31 pm wrote:
> > nksingh (none.delete@this.none.non) on August 21, 2014 11:34 am wrote:
> > > > According to my understanding, the cheapest practical way to get effect of membar in
> > > > x86 WB memory region would be reading from the address of last write. Or, if you want
> > > > barrier after read (I never want, but I am not a lockless guy), writing to some dummy
> > > > locations that you are likely to own and then reading that location back.
> > >
> > > From my interpretation of the x86 memory model and your statement above, I think you won't
> > > get the ordering you desire. There's a squirrely exception in the x86 memory order model
> > > for store-buffer forwarding. In the version of the Software Dev Manual I have on hand, this
> > > behavior is spelled out in a section called "Intra-Processor Forwarding Is Allowed."
> > >
> >
> > I will get ordering I desire, because I do not desire sequential consistency.
> > All I want is to get as strong effect as architecturally guaranteed by MFENCE instruction.
> >
> > Pay attention that according to MEMORY ORDERING section of "Intel 64 and IA-32 Architectures
> > Software Developer’s Manual" MFENCE does not help total order at all.
> > Locked instructions appear to be the only documented way to achieve it.
>
> [X] := 1
> r1 := [Y]
>
> vs
>
> [Y] := 1
> r2 := [X]
>
> With a global sequential ordering, the condition (r1 != 0 or r2 != 0) holds. It does
> not hold for x86, due to reordering loads before stores. x86 with barriers:
>
> [X] := 1
> mfence
> r1 := [Y]
>
> vs
>
> [Y] := 1
> mfence
> r2 := [X]
>
> Then the condition holds.
I don't think so. According to my understanding of the rules, condition does not hold.
IMHO, you incorrectly interpret the rule that says "Reads cannot pass earlier LFENCE and MFENCE instructions". According to my understanding of this rule, it is strictly local and has no global effects.
Looking at it from perspective of what is happening in hardware, I claim that mfence allowed to drain local store queue, but does not obliged to drain it. So, despite fences, writes to [X] and [Y] can still be in their respective store queues while reads are served from their respective local caches.
> However, if I read your idea correctly:
>
>
> [X] := 1
> r8 := [X]
> r1 := [Y]
>
> vs
>
> [Y] := 1
> r9 := [Y]
> r2 := [X]
>
> I'm fairly sure this does NOT make the condition hold, exactly due to the store forwarding exception.
>
> If you are reading it as, "loads have to be in-order, therefore the 2nd load must be executed after the first,
> therefore the 2nd load must be executed after the store," then I can understand where you get the idea.
Yes, that's the source.
> However
> store forwarding exception is saying that loads can be satisified from a location before stores to that location
> become visible to other CPUs (it's actually more an exception to cache coherency more than memory consistency).
> The first load can be executed before the store becomes visible to other CPUs -- I cannot see any rule that
> says the second load can not also be executed before that store becomes visible.
>
But mfence is no better. Only LOCK helps with total ordering over WB region.