Where Ever I Merom
Intel’s Merom is a dual core, 64 bit, 4 issue superscalar, moderately pipelined, out-of-order MPU, implemented in a 65nm high performance bulk process. The processor can address 36 bits of physical memory and 48 bits of virtual memory and supports all of Intel’s *Ts. Each CPU is fed by a 32KB L1I cache, a dual ported 32KB L1D cache, with a shared 4MB L2 cache. Merom variants currently clock at up to 3.0GHz, but will probably scale to 3.33GHz. Each product family has a max TDP, Merom at 35W, Conroe 65W and Woodcrest at 80W. However, lower power parts will be available based on customer demand and usage. For example, LV Woodcrest products are targeted for blade systems, and sacrifice clockspeed to achieve a 40W TDP. Figure 3 below shows the Merom microarchitecture.
Figure 2 – Merom Microarchitecture
In Figure 2, the sub-blocks are color coded. Purple units are responsible for fetching instructions and predicting branches in the front end, while the orange blocks decode the x86 instructions into uops. The sections in tan are internal buffers for uops and the scheduling and out-of-order blocks. The functional units are all in blue, and the memory system is in green. Lastly, the actual x86-64 register state is in grey. For comparison, the Yonah and Pentium 4 microarchitectures are shown below in slightly less detail.
Figure 3 – Microarchitectures of Yonah and P4
Merom fetchs 128 bits of instructions, decodes 4+1 x86 instructions, issues 7 uops, reorders and renames 4 uops, dispatches 6 uops to execution units and retires up to 4 uops each cycle. In every regard, this is much wider and more aggressive than the P4 and Yonah. In the next few pages, we will dive into the details and explore each section of Merom.