By: Aaron Spink (aaronspink.delete@this.notearthlink.net), August 17, 2014 1:56 pm
Room: Moderated Discussions
Ricardo B (ricardo.b.delete@this.xxxxx.xx) on August 17, 2014 12:06 pm wrote:
>
> Yes and no.
>
> Again, taking the Linux kernel as an example, it has two families of barriers.
> rmb/wmb/mb are to be used to order access to devices even in single processor systems.
> smp_rmb/smp_wmb/smp_mb are to be used to order access across processors and
> they become NOPs if the kernel is compiled to single processor systems.
>
> Each of those barriers are then implemented in a target specific manner.
>
> In x86, the first barrier family is implemented as the rather expensive lfence/sfence/mfence instructions.
> But the second family (smp_*) doesn't generate any actual instructions,
> it just prevents the compiler from optimizing across the barrier.
>
This is my understanding as well. The barriers are in all code, but in the compiled x86 code, they are nops, while in many of the compiled non-x86 code streams they are actual barriers because they have to be.
The reality is that x86 has to deal with the strongly ordered semantics so the hardware does it very efficiently. For most weakly ordered designs, they basically punt on the barriers and utilize fairly heavy weight mechanisms to enforce it.
>
> Yes and no.
>
> Again, taking the Linux kernel as an example, it has two families of barriers.
> rmb/wmb/mb are to be used to order access to devices even in single processor systems.
> smp_rmb/smp_wmb/smp_mb are to be used to order access across processors and
> they become NOPs if the kernel is compiled to single processor systems.
>
> Each of those barriers are then implemented in a target specific manner.
>
> In x86, the first barrier family is implemented as the rather expensive lfence/sfence/mfence instructions.
> But the second family (smp_*) doesn't generate any actual instructions,
> it just prevents the compiler from optimizing across the barrier.
>
This is my understanding as well. The barriers are in all code, but in the compiled x86 code, they are nops, while in many of the compiled non-x86 code streams they are actual barriers because they have to be.
The reality is that x86 has to deal with the strongly ordered semantics so the hardware does it very efficiently. For most weakly ordered designs, they basically punt on the barriers and utilize fairly heavy weight mechanisms to enforce it.