By: Martin H. Kristiansen (, August 15, 2006 1:03 am
David Kanter ( on 8/14/06 wrote:
>Martin H. Kristiansen ( on 8/14/06 wrote:

>>However, I was unable to find if you could trigger event counters on L1 store misses
>>(either trigger directly or deduce them by inference from other counted events).
>You should be able to estimate it pretty well using the L1 cache misses and then the relative frequency of LD and ST.

No, I don't thinkt that's enough. Read-modify-writes (like updating any kind of data structure) can cause load misses, but hardly ever any store misses. Traditionally we see a 2: 1 load to store ratio, I'm guessing the load to store miss ratio is higher.

>>I think it is interesting to know just how common these are in order to infer how
>>important the store-reordering of Core 2 is, - and also to >infer how much the K8L,
>>which will only have load-reordering, will lag behind.
>x86 cannot re-order stores. What core does that is new is that LDs can be moved around STs with unknown addresses.

Doh! That's what I meant, sorry for being unclear.

>>We've discussed earlier that store misses carry as many data dependencies as loads
>>do. I suspect store misses are significantly less frequent than load misses but
>>have no way to get hard data, - at least from CodeAnalyst.
>So memory disambiguation will reduce those dependencies by allowing LDs to move around STs with unknown addresses.

Yeah, the less pessimistic approach should reduce the number of stalls from false dependencies, ie. loads after stores where there is no real data-dependency going through main memory.

