For SPECjbb2005, we are using the latest general availability release of BEA JRockit 5.0 R27.4 (64 bit)which includes Harpertown optimizations. The benchmark was run in two different configurations to reflect different levels of optimization that are seen in production uses, according to Henrik Stahl of BEA. We named the two settings ‘base’ and ‘peak’, stealing our terminology from the ever popular SPEC CPU benchmark. The base configuration reflects a minimal amount of tuning; only setting the heap size. The peak configuration represents the best possible software flags for the JVM, based on BEA’s expertise. In both cases, hardware prefetch was enabled, which can lower performance due to conflicts with the software prefetching. We feel that this more accurately represents real world practices. A good Java developer will be able to give guidelines for which command line switches should be use, however, relatively few are familiar with BIOS optimizations. The two command lines are shown below:
Base: -Xms3650 -mx3650
Peak: -Xms3650m -Xmx3650m -Xns3000m -XXaggressive -XXlazyunlocking -Xlargepages -Xgc:genpar -XXtlasize:min=4k,preferred=1024k –XXcallprofiling
Newer versions of JRockit will automatically use 32-bit pointers if the heap is limited to under 4GB, hence the maximum heap size is set to 3650MB and the -XXcompressedrefs is no longer needed. In all cases, we used a single JVM because it represents a more realistic situation. While multiple JVMs often have higher performance, it requires binding each instance of the JVM to a specific processor or pool of memory – which isn’t often done for smaller DP servers.
Figure 9 – SPECjbb2005 Performance
Note that the base scores use a square marker, while the peak scores appropriately use a triangular marker.
SPECjbb2005 is the most recognizable and most commercially significant of all the benchmarks we use, but it is not particularly well understood. Unlike SPEC CPU, there has been very little public discussion of the impact of the compiler/JIT options on performance. For example, the performance gained from moving from a 2.33GHz Clovertown to a 3GHz Harpertown is the same (25%) as simply changing from a basic out of the box configuration to highly tuned JVM settings. The two changes combined yield a 55% improvement in performance.
Normalizing the scores to clock frequency is not particularly useful in this case. Our measured IPC for Harpertown is actually 5% lower than Clovertown, which is misleading. The additional cache and faster bus in Harpertown will improve the IPC at the same frequency as average memory latency decreases.
The discrepancy is almost assuredly because our Harpertown and Clovertown processors run at different frequencies. Comparing the best official Clovertown SPECjbb2005 submissions at 3GHz and 2.66GHz shows that increasing the frequency by 12% increases performance by only 5%, thus IPC decreased by 7% to make up the difference. We estimate that Clovertown at 3GHz has 12% higher performance than a 2.33GHz model, but 28% higher frequency. This implies that the IPC decreases by about 14% for Clovertown as the frequency increases from 2.33Ghz to 3Ghz. Combining all this information together suggests that at 3GHz, Harpertown’s IPC is roughly 10-15% higher than a 3GHz Clovertown.