Westmere Arrives

Pages: 1 2

Uncore Changes

The uncore, especially the memory controller, in Westmere received a fair bit of attention. The memory controller has deeper buffering and more in-flight accesses to improve the utilization of the available memory bandwidth. For increased capacity and bandwidth, Westmere’s memory controller can drive two DIMMs per channel at the full 1.33GT/s. Current Nehalem systems (and most AMD systems) run two DIMMs per channel at reduced bandwidth; Nehalem’s memory controller operates two DIMMs per channel at 1.06GT/s, sacrificing about 20% of the bandwidth. At the physical layer, Westmere’s DDR3 PHYs can use low voltage DDR3 (1.35V vs. 1.5V) to reduce power consumption.

Westmere also moves the APIC timer into the uncore, so that it continues to operate consistently, even when the processor is in C3 or C6 power saving states. In previous CPUs, the APIC timer would not function when in a deep sleep state. This caused problems with Windows Server 2008 R2, which uses the APIC for timekeeping. The previous solution was to disable the C3 and C6 states in Nehalem. Since Westmere moves the APIC timer into a region of the chip which does not go to sleep, the C3 and C6 states can be safely used in Windows Server 2008 R2.

Last, the uncore is power gated more effectively in Westmere. Nehalem was productized with 4 power gates (one for each core). Early images from ISSCC showed a fifth power gate that controlled the uncore, but this was never supported on Nehalem. However Lynnfield, with a much larger uncore (including a PCI-E controller) and simpler system architecture (only a single socket) used this fifth power gate. One of the problems with power gating the uncore on Nehalem is that all the architectural state from the 4 cores was kept in the L3 cache. To address this issue, Westmere has a small C6SRAM (with a dedicated voltage supply) that can hold the architectural state from the 6 cores, so that the entire uncore can be power gated off.

Productization

The productization of the Westmere family is shown below, and pricing can be found at Intel’s website. The image above is somewhat confusing since it lacks frequencies for the new Westmere SKUs. In general, SKUs with the same last two digits run at the same frequency (e.g. the X5570 and X5670 both run at 2.93GHz).


Figure 2 – Xeon 56xx Products, Courtesy Intel

Westmere server products now range up to a 130W TDP; with Nehalem, only workstation products had a 130TDP and servers were limited to 95W. The new X5680 is a direct analogue of the W5590 and runs at 3.33GHz. Another change from Nehalem is a new set of SKUs which disable the two additional cores in exchange for higher frequency. Since there are fewer cores, the L3 cache/core also jumps up to 3MB, reducing memory traffic correspondingly. The X5677 runs at 3.46GHz and the X5667 at 3.06GHz.

Unfortunatey, neither Intel’s website nor the above image have details on turbo boost frequencies; only the base clocks are given. However, the X5670 and X5680 can increase the frequency by one bin (133MHz) if 3-6 cores are active and two bins if 1-2 cores are active.

For sufficiently threaded workloads, it’s clear that Westmere will bring a substantial improvement; Intel is claiming 30-50%, which is plausible. Of course, the benefits will be much smaller for workloads that are not multi-threaded, or contain substantial single threaded elements. In the next few days, we will have a review of Westmere online complete with performance and power number, but for now we are still collecting and analyzing the data.

Pages: « Prev  1 2  

Discuss (22 comments)