By: Ricardo B (ricardo.b.delete@this.xxxxx.xx), August 17, 2014 12:06 pm
Room: Moderated Discussions
Ronald Maas (rmaas.delete@this.wiwo.nl) on August 17, 2014 9:57 am wrote:
> In reality x86 will likely see the same amount of barriers compared
> to other architectures. This because of the following scenario:
> 1) Someone works on a project X that requires 5 memory barriers
> to function. On x86 everything works flawlessly
> 2) Got some critical bug reports because his software fails on other architectures. E.g. ARM
> 3) Spend time to track down the root cause of the problem and after adding 10 extra memory
> barriers in various places, finally the software start working properly on cores with weak
> memory ordering. Actually only 1 or 2 extra barriers where needed, but developer did not
> remove the unneeded ones because it is late already and need to catch some sleep
> 4) Next day developer gets paranoid as he/she does not want to spend other nights / week-ends burning the midnight
> oil, reviews the code, and for good measure puts in another extra 10 barriers in different sections of the
> code where he / she suspect it may be needed for e.g. PowerPC. Does not hurt to be careful. Right?
> 5) Now at the end of the day, all architectures see 25 memory barriers regardless
>
> It is not that developers are generally incompetent, but there are always corner cases that
> are not trivial to understand. As people in software developers usually work on a lot of pressure
> to deliver, they they do not have the luxury to spend reading through a whole bunch of PDFs
> in order to get a proper level of understanding. Especially when his / her manager is looking
> over his shoulder to make sure that persons actually works on solving the problem.
Yes and no.
Again, taking the Linux kernel as an example, it has two families of barriers.
rmb/wmb/mb are to be used to order access to devices even in single processor systems.
smp_rmb/smp_wmb/smp_mb are to be used to order access across processors and they become NOPs if the kernel is compiled to single processor systems.
Each of those barriers are then implemented in a target specific manner.
In x86, the first barrier family is implemented as the rather expensive lfence/sfence/mfence instructions.
But the second family (smp_*) doesn't generate any actual instructions, it just prevents the compiler from optimizing across the barrier.
> In reality x86 will likely see the same amount of barriers compared
> to other architectures. This because of the following scenario:
> 1) Someone works on a project X that requires 5 memory barriers
> to function. On x86 everything works flawlessly
> 2) Got some critical bug reports because his software fails on other architectures. E.g. ARM
> 3) Spend time to track down the root cause of the problem and after adding 10 extra memory
> barriers in various places, finally the software start working properly on cores with weak
> memory ordering. Actually only 1 or 2 extra barriers where needed, but developer did not
> remove the unneeded ones because it is late already and need to catch some sleep
> 4) Next day developer gets paranoid as he/she does not want to spend other nights / week-ends burning the midnight
> oil, reviews the code, and for good measure puts in another extra 10 barriers in different sections of the
> code where he / she suspect it may be needed for e.g. PowerPC. Does not hurt to be careful. Right?
> 5) Now at the end of the day, all architectures see 25 memory barriers regardless
>
> It is not that developers are generally incompetent, but there are always corner cases that
> are not trivial to understand. As people in software developers usually work on a lot of pressure
> to deliver, they they do not have the luxury to spend reading through a whole bunch of PDFs
> in order to get a proper level of understanding. Especially when his / her manager is looking
> over his shoulder to make sure that persons actually works on solving the problem.
Yes and no.
Again, taking the Linux kernel as an example, it has two families of barriers.
rmb/wmb/mb are to be used to order access to devices even in single processor systems.
smp_rmb/smp_wmb/smp_mb are to be used to order access across processors and they become NOPs if the kernel is compiled to single processor systems.
Each of those barriers are then implemented in a target specific manner.
In x86, the first barrier family is implemented as the rather expensive lfence/sfence/mfence instructions.
But the second family (smp_*) doesn't generate any actual instructions, it just prevents the compiler from optimizing across the barrier.