Westmere Performance

Pages: 1 2 3 4 5 6 7 8


Our first article already introduced Intel’s new 32nm Westmere-EP microprocessor. As a ‘tick’, Westmere is primarily about the benefits of a new process technology, not microarchitectural improvements. Thus the main advances should come from a combination of: more cores, more cache, additional frequency headroom and thermal improvements.

System Configuration

Since Westmere is a socket compatible with Nehalem, we are re-using the Asus R12A 1U server from our Nehalem performance preview. This system will host both the Westmere and Nehalem processors, thus eliminating any platform level differences (e.g. memory subsystems, power supplies, fans, etc.).

Westmere is represented by two X5670 CPUs running at 2.93GHz; each contains 6 cores, 12 threads and a 12MB L3 cache, with a TDP of 95W. With the dynamic voltage and frequency scaling (DVFS) that Intel calls Turbo Boost, the X5670 can reach 3.2GHz with 3-6 cores active, and 3.33GHz with 1-2 cores active.

The Nehalem microprocessors are the X5570, which contains 4 cores, 8 threads, with an 8MB L3 cache, running at 2.93GHz with a 95W TDP. The X5570 has similar bins for Turbo Boost as the X5670, 3.2GHz with 3-4 cores and 3.33GHz with 1-2 cores.

Chart 1 – System Configurations

The system was configured as shown above in Chart 1. Note that two storage devices were used, a standard hard disk and a solid state drive. Windows Server 2008 R2 was installed on the hard disk (due to capacity constraints), while the benchmarks resided on the SSD. The BIOS was flashed to enable Westmere support, and almost all BIOS options were left as default. The only changes were enabling I/O virtualization and enabling PCI-Express power saving states. Hardware prefetching, turbo boost and CPU power management were enabled in the BIOS for all tests.

One wrinkle that emerged with R2 of Windows Server 2008 was an interaction between the operating system’s power saving modes and Intel’s power saving features in the microprocessor. When Windows is configured in ‘balanced’ mode, the server idles at roughly 115W, but turbo boost is disabled by the OS. In ‘high performance’ mode, the turbo boost is enabled, but the system idles at 125W. Ultimately, we decided to use the ‘balanced’ power profile, forgoing the additional frequency headroom in favor of power. We also modified the ‘balanced’ power profile to enable PCI-Express power management states. So for this review, Turbo-mode has been enabled in the BIOS, but disabled by the OS and the processors are running at their standard frequencies (2.93GHz).

Performance for each benchmark is the average of three runs, except for SPECpower_ssj2008 which uses a single run. Power measurements are taken from a single run, rather than averaged across multiple runs.

Power measurements (watts at the wall, taken with a Watts Up Pro meter) were logged on the system we were testing at 1 second intervals over a USB connection. We measured a negligible impact on performance from this logging, and discussions with some of the architects of SPECpower have confirmed this behavior. However, logging on the system under test does have a Heisenberg-like impact on power consumption. The system will never truly quiesce into a full idle state (where all CPUs can shut down), while it is logging power measurements. Rather it will reach an ‘active idle’ state, which is roughly the same for Nehalem and Westmere. In a ‘true idle’ state, Westmere has lower power consumption due to the power gates on the uncore; however, when in active idle, the power gates do not engage and the power draw is identical.

Our benchmark collection has changed since the Nehalem preview. We updated POV-Ray 3.7 to beta 36 (from beta 30), as the old version expired. We removed SPECjbb2005 since SPECpower_ssj2008 tests both Java performance and power efficiency by default. Last, we also omitted SPECcpu2006 due to time constraints. In the future, we would like to compile our own SPECcpu binaries to have greater flexibility over the tuning parameters, such as auto-parallelization and profile guided optimization. Wowever, we did not have the tools nor the time for this preview. There are also a number of Linux based benchmarks that we have been working on with the help of community members, that we hope to show in the future.

In the mean time, here are the benchmarks for this preview:

  • POV-Ray 3.7 beta 36 (64-bit)
  • Valve VRAD
  • Fire Spread Probability model
  • Euler3D
  • Myrimatch
  • SPECpower_ssj2008 v1.07 with Oracle Jrockit 6 P28.0

Special thanks go out to a number of individuals for helping us put together this benchmark collection. Henrik Stahl of Oracle, Stuart Brittain at Systems for Environmental Management and Scott Wasson of Tech Report (who helped out with Myrimatch and Euler3D).

Our tests are accompanied by power measurements for the entire server at the wall. We are using a Watts Up Pro meter that was graciously provided to us by the manufacturer, Electronic Educational Devices. The Watts Up Pro meter samples a variety of information including power, voltage, current, and others every second, which can easily be logged with the right software. In our case, we used the test harness included with SPECpower_ssj2008 to log measurements while the benchmarks are running.

Pages:   1 2 3 4 5 6 7 8  Next »

Discuss (6 comments)