While SPECint includes a wide range of workloads, SPECfp focuses on floating point code, which tends to come from the scientific world. SPECfp includes code drawn from applications such as weather modeling, fluid dynamics, quantum chemistry and speech recognition. SPECfp consists of 17 benchmarks written in C, C++, Fortran and a mixture of C and Fortran.
Our testing methods are similar to those identified previously for SPECint. We used binaries supplied by Intel, with close to optimal compiler settings. All performance results presented are base, rather than peak. The SPECfp tests were run only once on both systems, although the scores were checked for reasonableness against existing valid submissions. The flags used for SPECfp_rate2006 were:
C benchmarks: [-QxAVX/-QxSSE4.2] -Qipo -O3 -Qprec-div- -Qansi-alias -Qopt-prefetch -Qauto-ilp32 -Qopt-mem-layout-trans:3 /F1000000000 -link /FORCE:MULTIPLE
C++ benchmarks: [-QxAVX/-QxSSE4.2] -Qipo -O3 -Qprec-div- -Qansi-alias -Qopt-prefetch -Qcxx-features -Qauto-ilp32 -Qopt-mem-layout-trans:3 /F1000000000 shlW64M.lib -link /FORCE:MULTIPLE
Fortran benchmarks: [-QxAVX/-QxSSE4.2] -Qipo -O3 -Qprec-div- -Qansi-alias -Qopt-prefetch /F1000000000 -link /FORCE:MULTIPLE
Benchmarks using Fortran and C: [-QxAVX/-QxSSE4.2] -Qipo -O3 -Qprec-div- -Qansi-alias -Qopt-prefetch -Qauto-ilp32 -Qopt-mem-layout-trans:3 /F1000000000 -link /FORCE:MULTIPLE
Figure 7 shows performance for SPECfp_rate2006, which is measured as a speedup ratio over the reference system. The architectural changes in Sandy Bridge-EP are relatively more beneficial for floating point and scientific benchmarks. For example, the 256-bit instructions in AVX are generally only available for floating point data types, rather than integers. Additionally, the benchmarks in SPECfp are much more bandwidth sensitive and will respond well to the prefetcher optimizations and the higher bandwidth memory controller. Also, FP workloads are more likely to effectively issue two load instructions per cycle. So the performance gains in SPECfp should be a fair bit higher than SPECint.
Figure 7. SPECfp_rate2006 Performance
The overall gain for Sandy Bridge-EP on SPECfp_rate2006 is 95%, which translates into about 45% higher performance per core. At the low end of the spectrum, is gromacs with a 66% gain, while dealii improved by an astounding 179%. There is no one particular benchmark within SPECfp that is most closely correlated with real workloads, in part because SPECfp is generally reflective of scientific computing. However, excluding these outliers, the performance gains seemed to be fairly steady. Most tests were within a range of 75% to 110%. Generally, these results are consistent with expectations relative to SPECint, which realized an overall gain of 77%, compared to 95% for SPECfp.
Figure 8. SPECfp_rate2006 Power Efficiency
Figure 8 shows the power efficiency for the individual benchmarks in SPECfp_rate2006. The efficiency gains for Sandy Bridge-EP are muted compared to the impressive performance improvements. Sandy Bridge-EP’s power consumption for SPECfp tests seems to vary quite a bit more than SPECint. The server draws between 432-564W for the various SPECfp benchmarks; the power consumption relative to Westmere-EP is between 35% and 70% higher.
This variation in power consumption means that the overall performance/watt improvements are determined by both performance gains and power optimizations in unpredictable ways. For example, gromacs saw the smallest performance gain (66%) and would be expected to also demonstrate the smallest change in efficiency. However, the performance/watt for gromacs increased by 21%, compared to 11% for lbm. Overall, efficiency improved by 10%- 35%, with dealli as a huge outlier at 77%. So the SPECfp performance/watt gains were consistent with those in SPECint, but slightly higher.
Discuss (15 comments)