Nehalem Performance Preview

Pages: 1 2 3 4 5 6 7 8 9 10 11 12

SPECint_rate2006

SPECcpu2006 is another first time benchmark for us here and quite a significant one. It is the single most prominent and influential benchmark for measuring general purpose microprocessor performance. While the main focus of SPECcpu is the microprocessor, it also measures the memory subsystem and compilers.SPECcpu2006 is composed of two test suites – SPECint and SPECfp, which respectively contain a collection of integer and floating point compute intensive benchmarks. SPECint contains a dozen C and C++ benchmarks, ranging from the GCC compiler, to compression, AI and traffic optimization (see http://www.spec.org/cpu2006/CINT2006/ for descriptions). SPECcpu can be run in two different modes, with two different levels of tuning. The SPECcpu speed test measures single threaded performance of a microprocessor (which is the most relevant metric for most client systems), while the SPEC_rate tests measure the multithreaded performance of a system by running multiple independent copies of the SPEC tests across the system. The base level of tuning for SPEC is supposed to reflect reasonable development practices and requires that all sub-tests be compiled using the same flags and options, and also forbids feedback directed optimization. The peak level of tuning focuses on what is achievable when every available trick is used, including feedback directed optimization and other relatively uncommon techniques.We ran SPECint_rate2006 (base) on both systems, using binaries supplied by Intel that were compiled with ICC 11.0 (both due to time constraints and unavailability of Visual Studio compilers for Windows) and the Smart Heap library. Unfortunately, due to the run time, we were only able to complete a single run of SPECint_rate2006 on the Harpertown server. Note that these runs are not complaint with SPEC submission rules, which require at least three runs of all benchmarks. In the future, there are a variety of other interesting experiments to be run – for instance, running the speed test, which is single threaded, or experimenting with various compiler options.
Figure 21 – SPECint_rate2006 Performance
Figure 22 above shows performance for SPECint_rate2006, which is measured as a speedup ratio over a reference system (in this case a Sun Ultra Enterprise 2 workstation with a 296-MHz UltraSPARC II). The score for Nehalem on libquantum is 761.9 – quite literally off the chart and a result of some rather serious optimizations (libquantum can be easily auto-parallelized). Generally speaking, the Nehalem system is about 1.8X faster than the Harpertown based system, although that falls to 1.7X when excluding the anomalous libquantum result. Either way it is a massive improvement that demonstrates the cumulative effect of all the features in Nehalem. The largest contributing factors are likely to be lower memory latency, improved branch mispredict recovery (which accounts for much of the improvement in mcf).
Figure 22 – Nehalem SPECint_rate2006 Power Consumption I
Figure 23 above shows the system-level power consumption over time for half of the benchmarks (the remaining half is featured in Figure 24 below).
Figure 23 – Nehalem SPECint_rate2006 Power Consumption II
Data was collected for the Harpertown system as well, but it proved to be substantially less insightful and hence was omitted. The power gates in Nehalem make it very easy to identify when a benchmark has started (or stopped). Harpertown lacks comparable features and even with a 30 second pause between benchmarks, it is extremely difficult to pin point when a benchmark ends with any degree of precision.While figures 22 and 23 are a little busy, they show some extremely interesting details. First of all, power consumption varies for many of the benchmarks – demonstrating phased behavior of all sorts. Most of the variations are small, although one section of xalancbmk consumes roughly ~390W, which is 40W more than the rest of the benchmark, and is actually the highest power draw throughout the entire SPECcpu suite (including floating point).

Pages: « Prev   1 2 3 4 5 6 7 8 9 10 11 12   Next »

Discuss (52 comments)