By: anon (anon.delete@this.anon.com), August 17, 2014 10:55 pm
Room: Moderated Discussions
Ricardo B (ricardo.b.delete@this.xxxxx.xx) on August 17, 2014 3:14 pm wrote:
> Michael S (already5chosen.delete@this.yahoo.com) on August 17, 2014 2:22 pm wrote:
> > Wouldn't it introduce subtle bugs in complex lockless scenarios?
> > After all, x86 *does* promote later loads over
> > earlier unrelated stores in software-visible manner. I don't
> > expect for anything like that to happen in Linux
> > kernel, because it just does not do crazy lockless stuff outside of one or two well-defined modules.
> > But if the same strategy used in other big portable programs it can cause troubles.
>
>
> Yes, you need a more encompassing strategy than just mapping the barriers to NOPs.
>
> In x86, the atomic operations (eg, LOCK ADD) serve as barrier for the Store over Load reordering case.
>
> So, one strategy is to make all your barrier NOPs but ensure that you always have an atomic operation.
> This is, I think, the strategy on Linux: all the lockless stuff is made using a
> series of atomic_* functions, which in Linux map to atomic x86 instructions.
>
> Another is to map barriers to otherwise unused atomic x86 instructions
Linux's smp_mb() is NOT a noop on x86, for exact reason that Michael raised. smp_mb() is an x86 "mfence" instruction (and a compiler barrier).
There are other sets of APIs which are specified to provide various types of barriers as well. For example some "atomic" APIs (typically: the ones which both modify the target and return some result) are said to provide full barriers before and after, so on x86 these simply make use of x86's barrier semantics for its lock ; instructions.
Other atomic instructions are not guaranteed to give any barrier semantics. atomic_inc(), for example. In this case, x86's lock; inc instruction provides a *stronger* barrier than required, and something like powerpc gets away without any explicit ordering for that guy.
BTW, that's not a case of stronger vs weaker memory models, so it's somewhat a tangent to the thread. It's due to x86 always providing barriers and atomic RMW with the same instruction, whereas RISCs tend not to.
> Michael S (already5chosen.delete@this.yahoo.com) on August 17, 2014 2:22 pm wrote:
> > Wouldn't it introduce subtle bugs in complex lockless scenarios?
> > After all, x86 *does* promote later loads over
> > earlier unrelated stores in software-visible manner. I don't
> > expect for anything like that to happen in Linux
> > kernel, because it just does not do crazy lockless stuff outside of one or two well-defined modules.
> > But if the same strategy used in other big portable programs it can cause troubles.
>
>
> Yes, you need a more encompassing strategy than just mapping the barriers to NOPs.
>
> In x86, the atomic operations (eg, LOCK ADD) serve as barrier for the Store over Load reordering case.
>
> So, one strategy is to make all your barrier NOPs but ensure that you always have an atomic operation.
> This is, I think, the strategy on Linux: all the lockless stuff is made using a
> series of atomic_* functions, which in Linux map to atomic x86 instructions.
>
> Another is to map barriers to otherwise unused atomic x86 instructions
Linux's smp_mb() is NOT a noop on x86, for exact reason that Michael raised. smp_mb() is an x86 "mfence" instruction (and a compiler barrier).
There are other sets of APIs which are specified to provide various types of barriers as well. For example some "atomic" APIs (typically: the ones which both modify the target and return some result) are said to provide full barriers before and after, so on x86 these simply make use of x86's barrier semantics for its lock ; instructions.
Other atomic instructions are not guaranteed to give any barrier semantics. atomic_inc(), for example. In this case, x86's lock; inc instruction provides a *stronger* barrier than required, and something like powerpc gets away without any explicit ordering for that guy.
BTW, that's not a case of stronger vs weaker memory models, so it's somewhat a tangent to the thread. It's due to x86 always providing barriers and atomic RMW with the same instruction, whereas RISCs tend not to.