Out-of-Order Engine and Execution Units
The out-of-order engine for Nehalem was significantly augmented, both for general performance reasons, but also specifically to accommodate SMT. This is probably one of the largest architectural changes to the out-of-order engine since it was designed, since SMT requires that these resources be shared.
Figure 3 – Out-of-Order Engine and Execution Unit Comparison
As with Core 2, the register alias table (RAT) points each architectural register into either the Re-Order Buffer (ROB) or the Retirement Register File (RRF) and holds the most recent speculative state (whereas the RRF holds the most recent non-speculative and committed state). The RAT can rename up to 4 uops each cycle, giving each one a destination register in the ROB. The renamed instructions then read their source operands and issue into the unified Reservation Station (RS), which is used by all instruction types.
The ROB was enlarged from 96 to 128 entries with Nehalem, and the RS grew from 32 to 36 entries. Both the ROB and RS are shared across the two threads, but using different policies. The ROB is statically partitioned between both threads, which allows each thread to speculate equally far through the instruction stream. In contrast, the RS is competitively shared, based on demand because often times a thread may stall while waiting for an operand from memory and use relatively few RS entries as a result – in which case it is better for that thread to yield up RS entries to the more active thread. Any instructions in the RS which have all their operands ready are dispatched to the execution units, which are largely unchanged from the Core 2 and unaffected by SMT, except for an increase in utilization.