By: Tzvetan Mikov (tzvetanmi.delete@this.yahoo.com), October 30, 2006 11:55 am
Room: Moderated Discussions
Linus Torvalds (torvalds@osdl.org) on 10/30/06 wrote:
---------------------------
>Ricardo B (ricardo.b@xxxxx.xx) on 10/30/06 wrote:
>>
>>We know OoO x86 CPUs will issue it's loads out of order.
>>How can they keep load ordering regarding other CPUs if
>>they do that?
>
>Load ordering is a purely local thing. There is no
>load ordering "regarding other CPU's". You can replay
>cachable loads as many times as you like, and if you get
>into a situation where you notice that the value you loaded
>was modified by another CPU (so that the local CPU could
>see "incorrect ordering") you can just replay the load in
>the correct order.
I think the question here is not so much how it can be done technically, but is it really being done on all existing x86 CPUs ? As far as I remember your posts here are the first time ever I have heard this claim.
While the CPU has no problem reordering two loads to different known addresses locally, doing that for shared data is a different matter. As I suggested in another post, you need to be able to replay instructions if the second load gets "invalidated". That strikes me as very complicated. You are saying that the original PPro did it.
If it so, I don't understand why all x86 before Core don't already reorder loads across stores - it seems that it takes the same machinery.
---------------------------
>Ricardo B (ricardo.b@xxxxx.xx) on 10/30/06 wrote:
>>
>>We know OoO x86 CPUs will issue it's loads out of order.
>>How can they keep load ordering regarding other CPUs if
>>they do that?
>
>Load ordering is a purely local thing. There is no
>load ordering "regarding other CPU's". You can replay
>cachable loads as many times as you like, and if you get
>into a situation where you notice that the value you loaded
>was modified by another CPU (so that the local CPU could
>see "incorrect ordering") you can just replay the load in
>the correct order.
I think the question here is not so much how it can be done technically, but is it really being done on all existing x86 CPUs ? As far as I remember your posts here are the first time ever I have heard this claim.
While the CPU has no problem reordering two loads to different known addresses locally, doing that for shared data is a different matter. As I suggested in another post, you need to be able to replay instructions if the second load gets "invalidated". That strikes me as very complicated. You are saying that the original PPro did it.
If it so, I don't understand why all x86 before Core don't already reorder loads across stores - it seems that it takes the same machinery.