By: NoSpammer (no.delete@this.spam.com), July 15, 2015 3:14 pm
Room: Moderated Discussions
Linus Torvalds (torvalds.delete@this.linux-foundation.org) on July 15, 2015 2:27 pm wrote:
> NoSpammer (no.delete@this.spam.com) on July 15, 2015 12:50 pm wrote:
> >
> > I checked Intel docs again and I read that you need only one additional normal write and it will be ordered
> > after string operation.
>
> The intel memory ordering constraints wrt the string operations are that the individual
> stores that are part of the string operation itself are not necessarily ordered.
>
> But the loads and stores within the string operation are ordered
> wrt the loads and stores before and the stores after.
>
> So the string operations aren't really weakly ordered with regards to anything else, it's only within the
> operation itself that there is no guarantee. When doing a "memcpy" using "rep movsb", for example, there
> is no guarantee that the reads are done in the order specified by the "D" bit (or any other order).
>
> Maybe it does a full cacheline write, but the source of that full cacheline
> was perhaps done by reading the "upper" source cacheline first.
>
> So a "rep movs" really is no different from "memcpy" in that sense. If you do a memory barrier before
> the memcpy, and do a memory barrier after the memcpy, you'll know that the memcpy is done ordered
> wrt the other operations in the program, but you won't know what the ordering was within the memcpy.
> Maybe the library routine did the copy backwards, maybe it did the edge conditions (beginning and
> end) first, and the middle with a cacheline optimized routine. You don't know.
>
> But it doesn't change the ordering beyond the "borders" of the string instruction. Things
> before the string instruction were clearly before. Things after were clearly after.
Yes, I think you are right in the pessimistic interpretation. Lots of funny things can happen during rep. But if we want to do a lockless sync after "rep movs" we only need another write and when that one is observable the whole movs destination area can be considered up to date.
> NoSpammer (no.delete@this.spam.com) on July 15, 2015 12:50 pm wrote:
> >
> > I checked Intel docs again and I read that you need only one additional normal write and it will be ordered
> > after string operation.
>
> The intel memory ordering constraints wrt the string operations are that the individual
> stores that are part of the string operation itself are not necessarily ordered.
>
> But the loads and stores within the string operation are ordered
> wrt the loads and stores before and the stores after.
>
> So the string operations aren't really weakly ordered with regards to anything else, it's only within the
> operation itself that there is no guarantee. When doing a "memcpy" using "rep movsb", for example, there
> is no guarantee that the reads are done in the order specified by the "D" bit (or any other order).
>
> Maybe it does a full cacheline write, but the source of that full cacheline
> was perhaps done by reading the "upper" source cacheline first.
>
> So a "rep movs" really is no different from "memcpy" in that sense. If you do a memory barrier before
> the memcpy, and do a memory barrier after the memcpy, you'll know that the memcpy is done ordered
> wrt the other operations in the program, but you won't know what the ordering was within the memcpy.
> Maybe the library routine did the copy backwards, maybe it did the edge conditions (beginning and
> end) first, and the middle with a cacheline optimized routine. You don't know.
>
> But it doesn't change the ordering beyond the "borders" of the string instruction. Things
> before the string instruction were clearly before. Things after were clearly after.
Yes, I think you are right in the pessimistic interpretation. Lots of funny things can happen during rep. But if we want to do a lockless sync after "rep movs" we only need another write and when that one is observable the whole movs destination area can be considered up to date.