By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), December 3, 2014 11:15 am
Room: Moderated Discussions
Andreas (kingmouf.delete@this.gmail.com) on December 3, 2014 6:51 am wrote:
>
> I think that we should realize that the world does not spin only around iDevices and Apple. In my point
> of view this is more geared towards larger systems and servers rather than tablets and phones.
Yes. From what I've seen, the advantage of atomic RMW ops is under heavy contention, when they are better at making progress than the equivalent "read/op/cmpxchg" loop.
Obviously, I can compare mainly against x86, since that is the only other relevant architecture that has RMW operations. And x86 doesn't have the whole "load-linked" and "store-conditional" model (ARMv8 calls it "load/store exclusive"), so that read/op/cmpxchg is the closest semi-equivalent sequence.
I'm personally a fan of RMW instructions due to the guaranteed progress and the whole potential cache coherency protocol advantage (no need for write intent hints etc). So it makes sense to me.
Of course, there might be a code density issue driving this too. It is not unreasonable to have reference count updates etc that you want to inline in JIT'ed code, and the whole "loop over load-locked/add/store-conditional" model is just damn painful for that. So there might certainly be reasons to do the atomics even in small devices.
Which makes me wonder: the docs say that the instructions "also include controls associated with influencing the order properties" (good - the memory ordering requirement for an atomic that gets a reference count can be very different from the memory ordering of an atomic that just increments some statistics), but there are cases where you don't even care about SMP atomicity, you just want atomicity wrt interrupts or even just smaller code.
So I wonder if the "order properties" include that kind of "UP-only interrupt atomicity" ordering that isn't even SMP-safe but is potentially cheaper..
Linus
>
> I think that we should realize that the world does not spin only around iDevices and Apple. In my point
> of view this is more geared towards larger systems and servers rather than tablets and phones.
Yes. From what I've seen, the advantage of atomic RMW ops is under heavy contention, when they are better at making progress than the equivalent "read/op/cmpxchg" loop.
Obviously, I can compare mainly against x86, since that is the only other relevant architecture that has RMW operations. And x86 doesn't have the whole "load-linked" and "store-conditional" model (ARMv8 calls it "load/store exclusive"), so that read/op/cmpxchg is the closest semi-equivalent sequence.
I'm personally a fan of RMW instructions due to the guaranteed progress and the whole potential cache coherency protocol advantage (no need for write intent hints etc). So it makes sense to me.
Of course, there might be a code density issue driving this too. It is not unreasonable to have reference count updates etc that you want to inline in JIT'ed code, and the whole "loop over load-locked/add/store-conditional" model is just damn painful for that. So there might certainly be reasons to do the atomics even in small devices.
Which makes me wonder: the docs say that the instructions "also include controls associated with influencing the order properties" (good - the memory ordering requirement for an atomic that gets a reference count can be very different from the memory ordering of an atomic that just increments some statistics), but there are cases where you don't even care about SMP atomicity, you just want atomicity wrt interrupts or even just smaller code.
So I wonder if the "order properties" include that kind of "UP-only interrupt atomicity" ordering that isn't even SMP-safe but is potentially cheaper..
Linus