By: Linus Torvalds (torvalds.delete@this.linux-foundation.org), July 15, 2015 2:27 pm
Room: Moderated Discussions
NoSpammer (no.delete@this.spam.com) on July 15, 2015 12:50 pm wrote:
>
> I checked Intel docs again and I read that you need only one additional normal write and it will be ordered
> after string operation.
The intel memory ordering constraints wrt the string operations are that the individual stores that are part of the string operation itself are not necessarily ordered.
But the loads and stores within the string operation are ordered wrt the loads and stores before and the stores after.
So the string operations aren't really weakly ordered with regards to anything else, it's only within the operation itself that there is no guarantee. When doing a "memcpy" using "rep movsb", for example, there is no guarantee that the reads are done in the order specified by the "D" bit (or any other order).
Maybe it does a full cacheline write, but the source of that full cacheline was perhaps done by reading the "upper" source cacheline first.
So a "rep movs" really is no different from "memcpy" in that sense. If you do a memory barrier before the memcpy, and do a memory barrier after the memcpy, you'll know that the memcpy is done ordered wrt the other operations in the program, but you won't know what the ordering was within the memcpy. Maybe the library routine did the copy backwards, maybe it did the edge conditions (beginning and end) first, and the middle with a cacheline optimized routine. You don't know.
But it doesn't change the ordering beyond the "borders" of the string instruction. Things before the string instruction were clearly before. Things after were clearly after.
Linus
>
> I checked Intel docs again and I read that you need only one additional normal write and it will be ordered
> after string operation.
The intel memory ordering constraints wrt the string operations are that the individual stores that are part of the string operation itself are not necessarily ordered.
But the loads and stores within the string operation are ordered wrt the loads and stores before and the stores after.
So the string operations aren't really weakly ordered with regards to anything else, it's only within the operation itself that there is no guarantee. When doing a "memcpy" using "rep movsb", for example, there is no guarantee that the reads are done in the order specified by the "D" bit (or any other order).
Maybe it does a full cacheline write, but the source of that full cacheline was perhaps done by reading the "upper" source cacheline first.
So a "rep movs" really is no different from "memcpy" in that sense. If you do a memory barrier before the memcpy, and do a memory barrier after the memcpy, you'll know that the memcpy is done ordered wrt the other operations in the program, but you won't know what the ordering was within the memcpy. Maybe the library routine did the copy backwards, maybe it did the edge conditions (beginning and end) first, and the middle with a cacheline optimized routine. You don't know.
But it doesn't change the ordering beyond the "borders" of the string instruction. Things before the string instruction were clearly before. Things after were clearly after.
Linus