By: Linus Torvalds (torvalds.delete@this.osdl.org), October 30, 2006 9:57 am
Room: Moderated Discussions
Ricardo B (ricardo.b@xxxxx.xx) on 10/30/06 wrote:
>
>We know OoO x86 CPUs will issue it's loads out of order.
>How can they keep load ordering regarding other CPUs if
>they do that?
Load ordering is a purely local thing. There is no
load ordering "regarding other CPU's". You can replay
cachable loads as many times as you like, and if you get
into a situation where you notice that the value you loaded
was modified by another CPU (so that the local CPU could
see "incorrect ordering") you can just replay the load in
the correct order.
It's extremely unlikely that you'll ever get into an
actual conflict situation (ie another CPU writing to a
cacheline that you're were reading from speculatively), and
when it does happen, you know it did (because that will
obviously flush the cacheline from your cache), and you
can replay.
(In fact, you don't necessarily even need to replay: since
the other CPU has to wait for the cache flush to give it
exclusive access, you migth even just complete your sequence
before you release your hold on the cacheline. That only
works as long as you can guarantee no deadlocks, but that
should be easy - if the local CPU doesn't itself need to
get any exclusive access, there can be no deadlock from
delaying other CPU's a bit).
Remember: in order for anybody to be able to see
that you actually re-ordered loads, there must be writes
to the same location by somebody else (and those writes
must obviously participate in the cache coherency protocol).
So as long as there are no writes to a set of cachelines,
read ordering is a total non-issue, and you can reorder
your reads as much as you like, and "architecturally" they
all happened in-order - simply because nobody can ever
say that they saw anything else.
Of course, on the bus the replay will show up if
there is contention, but who cares? That is totally
invisible to all software involved, if the CPU does this
all correctly.
In other words: bus traffic has zero bearing on what the
software itself can see happening.
And notice that none of this is even SMP-related. Modern
CPU's do all these same things even locally, just to
be able to move loads ahead of stores, and with the exact
same rules. You just need to have the exact same logic for
moving loads ahead of other loads, and you will now appear
to be totally in order.
The keyword here is "appear". Nothing else matters.
Linus
>
>We know OoO x86 CPUs will issue it's loads out of order.
>How can they keep load ordering regarding other CPUs if
>they do that?
Load ordering is a purely local thing. There is no
load ordering "regarding other CPU's". You can replay
cachable loads as many times as you like, and if you get
into a situation where you notice that the value you loaded
was modified by another CPU (so that the local CPU could
see "incorrect ordering") you can just replay the load in
the correct order.
It's extremely unlikely that you'll ever get into an
actual conflict situation (ie another CPU writing to a
cacheline that you're were reading from speculatively), and
when it does happen, you know it did (because that will
obviously flush the cacheline from your cache), and you
can replay.
(In fact, you don't necessarily even need to replay: since
the other CPU has to wait for the cache flush to give it
exclusive access, you migth even just complete your sequence
before you release your hold on the cacheline. That only
works as long as you can guarantee no deadlocks, but that
should be easy - if the local CPU doesn't itself need to
get any exclusive access, there can be no deadlock from
delaying other CPU's a bit).
Remember: in order for anybody to be able to see
that you actually re-ordered loads, there must be writes
to the same location by somebody else (and those writes
must obviously participate in the cache coherency protocol).
So as long as there are no writes to a set of cachelines,
read ordering is a total non-issue, and you can reorder
your reads as much as you like, and "architecturally" they
all happened in-order - simply because nobody can ever
say that they saw anything else.
Of course, on the bus the replay will show up if
there is contention, but who cares? That is totally
invisible to all software involved, if the CPU does this
all correctly.
In other words: bus traffic has zero bearing on what the
software itself can see happening.
And notice that none of this is even SMP-related. Modern
CPU's do all these same things even locally, just to
be able to move loads ahead of stores, and with the exact
same rules. You just need to have the exact same logic for
moving loads ahead of other loads, and you will now appear
to be totally in order.
The keyword here is "appear". Nothing else matters.
Linus