By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 15, 2015 4:42 pm
Room: Moderated Discussions
NoSpammer (no.delete@this.spam.com) on July 15, 2015 3:14 pm wrote:
>
> Yes, I think you are right in the pessimistic interpretation. Lots of funny things can happen
> during rep. But if we want to do a lockless sync after "rep movs" we only need another write and
> when that one is observable the whole movs destination area can be considered up to date.
Yes. You must not use "rep stos/movs" to actually do the final unlock operation itself (because you don't know what the exact write pattern will be), but it's fine to use something like "rep stos" to clear an allocation, and then make that allocation visible to other threads by writing the address to memory.
You don't know what order the clearing was done in, but when the address of the newly allocated and cleared object is written to memory, that write itself will be a release, and ordered wrt any writes that were part of the memory clearing.
And yes, I actually made sure to get a clarification on this from Intel long ago, because we did worry that there would be some cases where the source reads or the result stores of a string instruction would behave differently wrt the instructions around it. And intel ended up making it very clear with explicit examples in the current architecture manuals.
In particular, even if you have several string operations back-to-back, the re-ordering happens only within each operation, not across them. So if you have two string operations after each other, the stores in the first string op are ordered wrt the stores in the second.
So it really ends up having the exact same semantics as calling memset/memcpy from a memory ordering standpoint.
Linus
>
> Yes, I think you are right in the pessimistic interpretation. Lots of funny things can happen
> during rep. But if we want to do a lockless sync after "rep movs" we only need another write and
> when that one is observable the whole movs destination area can be considered up to date.
Yes. You must not use "rep stos/movs" to actually do the final unlock operation itself (because you don't know what the exact write pattern will be), but it's fine to use something like "rep stos" to clear an allocation, and then make that allocation visible to other threads by writing the address to memory.
You don't know what order the clearing was done in, but when the address of the newly allocated and cleared object is written to memory, that write itself will be a release, and ordered wrt any writes that were part of the memory clearing.
And yes, I actually made sure to get a clarification on this from Intel long ago, because we did worry that there would be some cases where the source reads or the result stores of a string instruction would behave differently wrt the instructions around it. And intel ended up making it very clear with explicit examples in the current architecture manuals.
In particular, even if you have several string operations back-to-back, the re-ordering happens only within each operation, not across them. So if you have two string operations after each other, the stores in the first string op are ordered wrt the stores in the second.
So it really ends up having the exact same semantics as calling memset/memcpy from a memory ordering standpoint.
Linus