The Out-of-Order Engine – Renaming and Scheduling
Merom’s microarchitecture out-of-order engine generally resembles Yonah but with more resources. As Figure 5 below shows, they share many of the same structures, including the Register Alias Table, the Allocator, and Reorder Buffer. However, all of these structures have been enlarged to support more in-flight instructions and better exploit ILP.
Figure 5 – Out-of-Order Engine Comparison
Both the P4 and Yonah have a maximum throughput of 3 uops/cycle, while Merom is designed to handle 4 uops/cycle, as Figure 6 shows. As a point of reference, the P4 ROB is 126 entries, Dothan (and probably Yonah as well) has >40 entries, compared to 96 entries for Merom. In fact, Merom has more out-of-order resources relative to all other recent designs. The ratio of ROB entries to peak instructions in flight is around 1.7 for Merom, which compares favorably to the 90nm P4 (1.4 ROB/instructions), and the appropriate figure for Yonah, which I have been asked not to share.
The reservation stations are also much larger for Merom than the Pentium M; 32 entries instead of 24. A comparison to the P4 is a little more difficult, since it uses distributed schedulers rather than reservation stations. The P4 has a total of 46 scheduler slots, 8 for the memory units, and 38 for the ALUs and FPUs.