XML Mark 1.1
XML Mark 1.1 is a familiar benchmark that we have used for our server previews in the past. Previously, we tested two parsing methods (SAX and DOM), with different document mixes, for a total of 9 different configurations. However, this time we have changed up the benchmark substantially, focusing only on the SAX parsing method with a given document mix, but varying the number of threads from 1 to 8. This test corresponds to the SAX1 sub-test that we reported in other previews.
For XML Mark 1.1, we are using the latest general availability release of BEA JRockit 5.0 R27.4 (64 bit)which includes Harpertown optimizations. The benchmark was run in two different configurations to reflect different levels of optimization that are seen in production uses, according to Henrik Stahl of BEA. We named the two settings ‘base’ and ‘peak’, stealing our terminology from the ever popular SPEC CPU benchmark. The base configuration reflects a minimal amount of tuning; setting the heap size and garbage collection style (due to performance issues with normal GC on this benchmark), but eschewing more advanced optimizations. The peak configuration represents the best possible software flags for the JVM, based on BEA’s expertise. Note that in both cases, hardware prefetch was enabled. The two command lines are shown below:
Base: -Xms3650m -Xmx3650m -Xgc:parallel
Peak: -Xms3650m -Xmx3650m -XXaggressive -XXlazyunlocking -Xlargepages -XXtlasize:min=4k,preferred=1024k -XXcallprofiling -Xgc:parallel
Newer versions of JRockit will automatically use 32-bit pointers if the heap is limited to under 4GB, hence the maximum heap size is set to 3500MB and the -XXcompressedrefs is no longer needed.
Figure 8 – XML Mark 1.1 Performance
Note that the base scores use a square marker, while the peak scores appropriately use a triangular marker.
Oddly enough, the IPC for XML Mark is unchanged for Harpertown. Again, the frequency difference crops up. Unfortunately with XML Mark, we don’t have any good comparisons between two identical processors at different clockspeeds to help us infer the IPC advantage for Harpertown.
The tuning options made a very modest difference in performance – the peak results are ~10% higher for both Clovertown and Harpertown at all load levels. 10% isn’t a small boost, but it’s a bit less than expected. Typically the compiling or JIT flags can improve performance up to 20-25%, or even more.