Memory Disambiguation, the Solution
In Merom, loads can be speculatively moved around store instructions with an unknown address. In effect, the conservative assumptions in the P6 and P4 have been relaxed. Unfortunately, when a load is moved illegally, there is a pipeline stall. To address this problem, Intel has implemented a dynamic alias predictor which predicts when a load cannot be moved around a store. This prediction is based on historical behavior and is said to achieve > 90% accuracy, although the architects would not comment on the matter.
Figure 9 below shows an example of how speculative memory disambiguation can increase performance compared to the conservative methods in other designs.
Figure 9 – The Benefits of Memory Disambiguation
This illustration is a little contrived, but it explains some of the importance of speculative disambiguation. Imagine a program that goes through a list of numbers and adds 1 to each entry; since this is the year 2006, the program was written poorly and the store addresses are unknown. Merom can speculatively move the blue load instructions before the red stores, and can make full use of multiple execution units. Other x86 CPUs would wait for each store address to be determined before issuing the load instructions. In this hypothetical example, Merom would be about two times faster (5 cycles versus 9).
To really understand why disambiguation is important, it helps to think about a typical situation, rather than a contrived example. For x86, my own performance analysis for common workstation benchmarks has shown that roughly 10-25% of instructions are stores, and 30-45% are loads. Merom supports an out-of-order window of 96 instructions, which means that on average as many as 43 loads and 22 stores could be in flight. Now imagine there was no memory disambiguation. All it takes is for one of those stores to have an unknown address, and between a quarter and a half of the instructions in flight are forced to stall in the Reorder Buffer, using up valuable resources. Memory disambiguation solves this problem, by eliminating most false aliasing.
There is also room for improvement in memory disambiguation. Some of the original work on memory disambiguation was done at the Digital Equipment Corporation and was intended for use in the ambitious but ill-fated EV8. The techniques that were evaluated at DEC could be used to speculatively re-order stores, as well as loads .