Nehalem Performance Preview

Pages: 1 2 3 4 5 6 7 8 9 10 11 12


This will probably be one of the last times we actually use SPECjbb2005. Not only is the benchmark getting a bit long in the tooth, but the newer SPECpower_ssj2008 also measures performance for server side Java and reports power consumption to boot. That being said, it’s a classic benchmark since it’s the first real commercial server benchmark we ran here.SPECjbb2005 is heavily dependent on the JVM configuration, so we are using the latest Oracle JRockit 6 P28.0 JVM. The P indicates that it is a performance optimized JVM, and that really makes quite a difference. We also carefully studied the JVM tuning options, since those can easily increase performance. One interesting observation is that the BOPS/second for the Harpertown system increased by around 70K with just four simple changes: the operating system upgrade to Windows Server 2008, JVM upgrade to P28.0, new command line options and using two JVMs with processor socket affinity. The command line options used were:start /AFFINITY [00ff, ff00] java -Xms3700m -Xmx3700m -Xns3100m -XXaggressive -Xlargepages -Xgc:genpar -XXcallprofiling -XXgcthreads=8 -XXtlasize:min=4k,preferred=1024kThe heap size is set to 3.7GB so that the JVM can get as much room as possible while still using compressed (32 bit) pointers. There are two JVMs running in parallel, each JVM is bound to a single socket using the /AFFINITY mask. This substantially increases performance by avoiding any placement issues with respect to memory (so little or no NUMA latency penalty). Hardware prefetchers were enabled in the BIOS for both systems – in the past this has decreased performance due to contention with software prefetch, but supposedly this has been minimized or eliminated in Nehalem. Unlike other benchmarks, SPECjbb2005 was only run a single time – the results tend to vary only slightly, so we are confident enough to skip repeated measurements (and the run time was also an issue).
Figure 16 – SPECjbb2005 Performance
Since our run of SPECjbb2005 uses two JVMs, the number of threads active is twice the number of warehouses shown above. If we were to score the two runs, the Nehalem system would score 527,788 BOPs/sec and the Harpertown would score 239,136. For reference, the best existing Nehalem score is 600,414 and the best Harpertown score is 368,034. To improve performance, we could have modified the JVM settings, but most of the changes were insignificant (e.g. the number of GC threads – we tried 2 for the Harpertown system, but it made no difference).
Figure 17 – SPECjbb2005 Performance
The power consumption data captured here is not particularly granular so we will omit a graph. The average power for the two systems in their measurement regions was 418W for the Harpertown system and 355W for the Nehalem system. Thus at peak load, Nehalem is roughly 3X as efficient as our Harpertown system as shown in the figure below.
Figure 18 – SPECjbb2005 Power Efficiency

Pages: « Prev   1 2 3 4 5 6 7 8 9 10 11 12   Next »

Discuss (52 comments)