Sandy Bridge-EP Review

Pages: 1 2 3 4 5 6

SPECint_rate2006

SPECcpu2006 is composed of two test suites – SPECint and SPECfp, which respectively contain a collection of integer and floating point compute intensive benchmarks. SPECint contains a dozen C and C++ benchmarks, ranging from the GCC compiler, to compression, AI and traffic optimization (see http://www.spec.org/cpu2006/CINT2006/ for descriptions). SPECcpu can be run in two different modes, with two different levels of tuning. The SPECcpu speed test measures single threaded performance of a microprocessor (which is the most relevant metric for client systems), while the SPEC_rate tests measure the multithreaded performance of a system by running multiple independent copies of the SPEC tests across the system. The base level of tuning for SPEC is supposed to reflect reasonable development practices and requires that all sub-tests be compiled using the same flags and options, and also forbids feedback directed optimization. The peak level of tuning focuses on what is achievable when every available trick is used, including feedback directed optimization and other relatively uncommon techniques.

We ran SPECint_rate2006 (base) on both systems, using binaries supplied by Intel that were compiled with ICC 12.1 (primarily due to time constraints) and the Smart Heap library. Unfortunately, due to the run time, we were only able to complete a single run of SPECint_rate2006 on either system. Note that these runs are not complaint with SPEC submission rules, which require at least three runs of all benchmarks. The optimization flags used were:

C benchmarks: [-QxAVX/-QxSSE4.2] -Qipo -O3 -Qprec-div- -Qopt-prefetch -Qopt-mem-layout-trans:3 /F512000000

C++ benchmarks: [-QxAVX/-QxSSE4.2] -Qipo -O3 -Qprec-div- -Qopt-prefetch -Qcxx-features -Qopt-mem-layout-trans:3 /F512000000 shlW32M.lib -link /FORCE:MULTIPLE

Figure 5 shows performance for SPECint_rate2006, which is measured as a speedup ratio over the reference system (a Sun Ultra Enterprise 2 workstation with a 296-MHz UltraSPARC II). libquantum is literally off the charts, because of aggressive compiler optimizations that have effectively broken this particular sub-test. Sandy Bridge-EP scores 4150 to 2320 for Westmere-EP on this sub-test. The overall performance gain for SPECint is 77%, while performance per core increased by 27%.


Figure 5. SPECint_rate2006 Performance

The benefits across the benchmark suite vary from 55% for gobmk, up to 122% for mcf. The latter is an outlier, mcf is known to be notoriously memory sensitive and probably saw a disproportionate gain due to prefetching. Other surprises include hmmer and h264, which are highly frequency sensitive and improved by about 90%. The most likely explanation is that both benefited tremendously from the second load port and perhaps the changes in the front-end. Of the SPECcpu2006 benchmarks, generally gcc is the most reflective of real software. It has the most complex control flow and unlike other tests, it is unlikely to benefit from auto-parallelization or other compiler tricks. On this one sub-test, Sandy Bridge-EP increases performance by roughly 70% or 27% on a per-core basis.

Figure 6 shows the power efficiency for the individual benchmarks in SPECint_rate2006. Since SPECint scores reflect a relative speedup compared to a reference system, the most appropriate metric is performance/watt, rather than using energy. No overall performance/watt metric is reported, since the run times (and hence weightings) for the individual benchmarks vary wildly.


Figure 6. SPECint_rate2006 Power Efficiency

The power efficiency improvements in Sandy Bridge-EP demonstrated by SPECint_rate2006 are fairly modest. The power consumption for the two systems is fairly consistent, with Sandy Bridge-EP reliably drawing about 40-55% more power for each test and 448-534W overall. Consequently, the changes in efficiency tend to reflect performance gains. The performance/watt gains range from 6% for gobmk to 46% for mcf; unsurprisingly, our two outliers reappear here. For our favorite test, gcc, the efficiency improvement was roughly 15%.


Pages: « Prev   1 2 3 4 5 6   Next »

Discuss (15 comments)