By: Ricardo B (ricardo.b.delete@this.xxxxx.xx), August 17, 2014 2:14 pm
Room: Moderated Discussions
Michael S (already5chosen.delete@this.yahoo.com) on August 17, 2014 2:22 pm wrote:
> Wouldn't it introduce subtle bugs in complex lockless scenarios? After all, x86 *does* promote later loads over
> earlier unrelated stores in software-visible manner. I don't expect for anything like that to happen in Linux
> kernel, because it just does not do crazy lockless stuff outside of one or two well-defined modules.
> But if the same strategy used in other big portable programs it can cause troubles.
Yes, you need a more encompassing strategy than just mapping the barriers to NOPs.
In x86, the atomic operations (eg, LOCK ADD) serve as barrier for the Store over Load reordering case.
So, one strategy is to make all your barrier NOPs but ensure that you always have an atomic operation.
This is, I think, the strategy on Linux: all the lockless stuff is made using a series of atomic_* functions, which in Linux map to atomic x86 instructions.
Another is to map barriers to otherwise unused atomic x86 instructions
> Wouldn't it introduce subtle bugs in complex lockless scenarios? After all, x86 *does* promote later loads over
> earlier unrelated stores in software-visible manner. I don't expect for anything like that to happen in Linux
> kernel, because it just does not do crazy lockless stuff outside of one or two well-defined modules.
> But if the same strategy used in other big portable programs it can cause troubles.
Yes, you need a more encompassing strategy than just mapping the barriers to NOPs.
In x86, the atomic operations (eg, LOCK ADD) serve as barrier for the Store over Load reordering case.
So, one strategy is to make all your barrier NOPs but ensure that you always have an atomic operation.
This is, I think, the strategy on Linux: all the lockless stuff is made using a series of atomic_* functions, which in Linux map to atomic x86 instructions.
Another is to map barriers to otherwise unused atomic x86 instructions