By: Paul A. Clayton (paaronclayton.delete@this.gmail.com), December 3, 2014 8:04 pm
Room: Moderated Discussions
anon (anon.delete@this.anon.com) on December 3, 2014 5:08 pm wrote:
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on December 3, 2014 11:15 am wrote:
[snip]
>> I'm personally a fan of RMW instructions due to the guaranteed progress and the whole potential cache
>> coherency protocol advantage (no need for write intent hints etc). So it makes sense to me.
>
> We've talked about this before here, but LL/SC can guarantee progress (when it is limited
> like POWER does), and the LL of course always carries a load-for-store signal.
IBM's zSeries provides "constrained transactions" which are guaranteed to complete (no need for a fallback path) as long as certain conditions are met (including size of the code path).
In theory simple atomic operations using ll/sc could be optimized through idiom recognition, but without an architectural guarantee a fallback path must be provided (though a simple always retry immediately mechanism would be valid and could work well even with a weaker guarantee than zSeries constrained transactions).
[snip code density]
> I wonder if power is another motivation. Theoretically if the instructions don't have side effects that
> depend on the value, you could export such operations to remote owner of the cacheline in your CC protocol,
> or to a memory controller if the cacheline is not owned, without blocking the core on the read.
Even with side effects performance can be improved in some cases by performing operations remotely from the requester. Cache line ping pong can have a significant performance impact. Again, providing some guarantees could be useful (e.g., with certain guarantees about multiple threads "simultaneously" incrementing a counter, software could avoid hierarchical counters).
> Linus Torvalds (torvalds.delete@this.linux-foundation.org) on December 3, 2014 11:15 am wrote:
[snip]
>> I'm personally a fan of RMW instructions due to the guaranteed progress and the whole potential cache
>> coherency protocol advantage (no need for write intent hints etc). So it makes sense to me.
>
> We've talked about this before here, but LL/SC can guarantee progress (when it is limited
> like POWER does), and the LL of course always carries a load-for-store signal.
IBM's zSeries provides "constrained transactions" which are guaranteed to complete (no need for a fallback path) as long as certain conditions are met (including size of the code path).
In theory simple atomic operations using ll/sc could be optimized through idiom recognition, but without an architectural guarantee a fallback path must be provided (though a simple always retry immediately mechanism would be valid and could work well even with a weaker guarantee than zSeries constrained transactions).
[snip code density]
> I wonder if power is another motivation. Theoretically if the instructions don't have side effects that
> depend on the value, you could export such operations to remote owner of the cacheline in your CC protocol,
> or to a memory controller if the cacheline is not owned, without blocking the core on the read.
Even with side effects performance can be improved in some cases by performing operations remotely from the requester. Cache line ping pong can have a significant performance impact. Again, providing some guarantees could be useful (e.g., with certain guarantees about multiple threads "simultaneously" incrementing a counter, software could avoid hierarchical counters).