Core Microarchitecture Performance: Woodcrest Preview

Pages: 1 2 3 4 5 6 7 8 9 10

A Brief Overview of Bensley

We covered the Bensley platform in great detail in a previous article, which is an excellent refresher. For readers that are less concerned with the intimate details, we will provide a brief recap of the key points. Figure 1 below shows the Bensley system architecture.

Figure 1 – Bensley Platform, courtesy of Intel

The Blackford chipset will support three generations of MPUs: Dempsey, Woodcrest and in the more distant future, Clovertown. Each MPU will be drop in compatible with the prior generation(s), although a BIOS revision may be necessary. The crucial detail in the diagram is that each front side bus features 8.5 or 10.5GB/s of bandwidth. Collectively, the two are independent segments give the processors 17.1 or 21.1GB/s of bandwidth. Blackford uses four channels of FB-DIMMs, which provides 17.1 or 21.1GB/s of memory bandwidth, balancing the front-side bus capabilities.

The main differences between the different generations of Bensley are the MPUs and the bus speeds. Woodcrest is based on the new Core microarchitecture, which is a dual core, modestly pipelined, out of order, 4-issue MPU implemented in Intel’s 65nm high performance process. Woodcrest provides substantially higher IPC and performance than Dempsey, which is based on the older P4 core. For those who wish to review the details of Woodcrest and the Pentium 4, I strongly recommend my coverage from IDF which is exceptionally thorough. It walks through every portion of Woodcrest in comparison to the Pentium 4 and Pentium M cores.

From a system perspective, the big differences come down to cores, caches, busses and power. Each Dempsey has two separate 2MB L2 caches, and two bus interfaces; it is essentially two MPUs on a single package. Woodcrest is a far more elegant and fully integrated solution. The two cores are on the same die and share 4MB of L2 cache and have a unified bus interface. Because there are fewer interfaces (and discontinuities in the bus), the signal is cleaner for Woodcrest and can be clocked up to 1.33GT/s. On top of that, because Woodcrest has a shared L2 cache, there is less coherency traffic. Together these two factors should substantially improve system scalability.

Perhaps most importantly, Woodcrest consumes far less power and dissipates much less heat than Dempsey because the microarchitecture was heavily optimized to reduce power consumption. Woodcrest comes in three varieties, the 40W TDP versions optimized for blades, the mainstream 65W TDP parts, and the 3GHz top bin part which has an 80W TDP. All parts below 3GHz will fall into the 65W or 40W TDP range. In comparison, the top bin Dempsey parts had a 130W TDP, and mid range parts were rated at 95W, and the massive power requirements and thermal issues precluded ever using Dempsey in a blade. On top of this, Woodcrest also has improved sleep states and clock gating which help to lower average power (recall that TDP and average power are quite distinct measures).

All these improvements make Woodcrest a much more compelling product than Dempsey. While Dempsey brought Intel to performance parity (or near enough) with AMD, power was still a substantial concern. With Woodcrest, Intel will be ahead of AMD in performance for almost all workloads, and will have roughly equivalent or better power and thermal characteristics.

Pages: « Prev   1 2 3 4 5 6 7 8 9 10   Next »

Discuss (22 comments)