By: Michael S (already5chosen.delete@this.yahoo.com), October 30, 2006 4:36 pm
Room: Moderated Discussions
David Kanter (dkanter@realworldtech.com) on 10/30/06 wrote:
---------------------------
>Michael S (already5chosen@yahoo.com) on 10/30/06 wrote:
>---------------------------
>>David Kanter (dkanter@realworldtech.com) on 10/30/06 wrote:
>>---------------------------
>>>>>>P6 and P4 do reorder loads across stores.
>>>>>
>>>>>With an unknown address ? Are you sure ?
>>>>
>>>>Yes and Yes.
>>>
>>>The P6 definitely does not move loads ahead of stores with >>an unknown address.
>>>I don't know where you got that impression.
>>
>>From measurements.
>
>I think the measurements are probably misleading. I've read the optimization manual
>several times, and there is no way that a LD can be executed ahead of a store with an unknown address.
>
ftp://download.intel.com/design/PentiumII/manuals/24512701.pdf
p.1.11
"Writes stored in the store buffer are always written to memory in program order. Pentium II and Pentium III processors use processor ordering to maintain consistency in the order in which data is read (loaded) and written (stored) in a program and the order in which the processor actually carries out the reads and writes. With this type of ordering, reads can be carried out speculatively; and in any order, reads can pass buffered writes, while writes to memory are always carried out in program order."
I don't see any mention of *know* store adress.
>>To be precise, I don't know if P6 ever retires loads that >it issues ahead of stores
>>with unknow address. May be it just issues them replays and >later when the store
>>address becomes known.
>
>No, they stall. The MOB doesn't have replays in the P6.
>
>>From performance perspective two variants are often similar
>>and my measurements are not sufficiently accurate to tell >the difference.
>
>That could be.
In my load-store reordering tests P6 behaves similarly to P4 and very differently from K8.
>
>>>I believe that the P4 just speculatively moves the LD ahead, and then replays if
>>>things don't work out. There is definitely no prediction logic though...
>>>
>>>DK
>>>
>>
>>You're likely correct. P4, except possibly the latest >stepping, assumes that unknown
>>store address wouldn't alias and just goes ahead with >speculation.
>
>It's not clear to me what changed in Prescott to be honest.
>
>DK
In my tests Nothwood P4 appears to promote loads ahead of unknown stores in 100% of the cases. Prescott P4 seem slightly more cautious.
---------------------------
>Michael S (already5chosen@yahoo.com) on 10/30/06 wrote:
>---------------------------
>>David Kanter (dkanter@realworldtech.com) on 10/30/06 wrote:
>>---------------------------
>>>>>>P6 and P4 do reorder loads across stores.
>>>>>
>>>>>With an unknown address ? Are you sure ?
>>>>
>>>>Yes and Yes.
>>>
>>>The P6 definitely does not move loads ahead of stores with >>an unknown address.
>>>I don't know where you got that impression.
>>
>>From measurements.
>
>I think the measurements are probably misleading. I've read the optimization manual
>several times, and there is no way that a LD can be executed ahead of a store with an unknown address.
>
ftp://download.intel.com/design/PentiumII/manuals/24512701.pdf
p.1.11
"Writes stored in the store buffer are always written to memory in program order. Pentium II and Pentium III processors use processor ordering to maintain consistency in the order in which data is read (loaded) and written (stored) in a program and the order in which the processor actually carries out the reads and writes. With this type of ordering, reads can be carried out speculatively; and in any order, reads can pass buffered writes, while writes to memory are always carried out in program order."
I don't see any mention of *know* store adress.
>>To be precise, I don't know if P6 ever retires loads that >it issues ahead of stores
>>with unknow address. May be it just issues them replays and >later when the store
>>address becomes known.
>
>No, they stall. The MOB doesn't have replays in the P6.
>
>>From performance perspective two variants are often similar
>>and my measurements are not sufficiently accurate to tell >the difference.
>
>That could be.
In my load-store reordering tests P6 behaves similarly to P4 and very differently from K8.
>
>>>I believe that the P4 just speculatively moves the LD ahead, and then replays if
>>>things don't work out. There is definitely no prediction logic though...
>>>
>>>DK
>>>
>>
>>You're likely correct. P4, except possibly the latest >stepping, assumes that unknown
>>store address wouldn't alias and just goes ahead with >speculation.
>
>It's not clear to me what changed in Prescott to be honest.
>
>DK
In my tests Nothwood P4 appears to promote loads ahead of unknown stores in 100% of the cases. Prescott P4 seem slightly more cautious.