Sun’s Surprising Spike SPARCs SPECulation
Sun recently introduced a new member of its UltraSPARC-III family. This new 900 MHz device differs from earlier US-III parts by the use of copper interconnect instead of aluminum. Although Sun submitted official SPEC scores for a 900 MHz Sun Blade 1000 Model 1900 using an aluminum US-III in late 2000, yield was apparently poor and this speed grade wasn’t generally available. A rarely occurring bug related to a prefetch buffer inside the US-III was discovered and as a work around this feature was disabled in firmware. Unfortunately for Sun Microsystems, this caused the SPECfp_base2k score for the Model 1900 to drop from an already lackluster 427 to a lamentable 369 in a second SPEC submission in the spring of 2001. So it comes as no small surprise that the Sun Blade 1000 Model 900 Cu workstation, based on the new copper processor turned in a SPECfp_base2k score of 629 in a recent submission. Both the Model 1900 and Model 900 Cu versions of the Blade 1000 feature 8 MB of L2 cache.
It is possible that the copper US-III incorporates improvements beyond a fix to the prefetch buffer bug as well as improvements to system level hardware between the Model 1900 and Model 900 Cu. However it appears much of the improvement can be attributed to the use of the Sun Forte 7 EA compiler instead of the earlier Forte 6 update 1 compiler used to generate the 427 and 369 scores. The reason why I say that with confidence can be seen quite readily in the graph in Figure 4.
Figure 3 SPECfp_base2k Component Scores for US-III and Competitors
The SPECfp_base2k scores for the 14 sub-component programs for the pre-bug fix Sun Blade 1000 Model 1900 submission using the Forte 6 compiler are compared to the recent Sun Blade Model 900 submission using the Forte 7 compiler. In addition, scores for the Itanium (4MB, 800 MHz version in an HP i2000), Alpha EV68C (1000 MHz version in an ES45/1000), and POWER4 (1300 MHz version in a pSeries 690 Turbo) are provided for reference. It is the new compiler’s score on the 179.art program that quite literally stands out from the rest. Although several other programs see appreciable improvement (the 183.equake score nearly triples), the new compiler increases the score of 179.art by more than 800%. In absolute terms this score, 8176, is more than four times higher than that achieved by the Alpha EV68 and POWER4, MPUs that easily beat the copper US-III on nearly every other SPECfp2k program. The 179.art score achieved by the Forte 7 compiler is vital to the new machine’s pumped up SPECfp_base2k score. If you leave 179.art out of the geometric mean then its SPECfp_base2k score would drop by nearly 18% from 629 to 516.
This remarkable improvement on 179.art is unusual in the field of compiler engineering where single digit percentage performance increases are often considered major victories. So it is no surprise that Sun’s achievement immediately raised suspicions among industry observers and competitors about the nature of the optimization employed by the Forte 7 compiler. It is hard not to think of Intel’s infamous eqntott compiler bug that erroneously increased the SPECint92 score of its processors by about 10% until caught and fixed . This bug used an illegal optimization that allowed the output of 023.eqntott to pass result checking with the test data used but was invalid in the general case.
Although the exact nature of the new Sun optimization isn’t known, suspicion has fallen on several inner loops within the 179.art program. Speculation is that this code was originally written in FORTRAN and converted to C. Because FORTRAN and C access two dimensional arrays in opposite row and column order it is presumed that 179.art accesses arrays by the wrong index in the innermost loop causing poor cache locality. It is possible that the new Sun compiler recognizes this situation and turns the nested loops that step through the array accesses “inside out” and achieves much lower cache miss rates. Whatever the exact nature of the Sun optimization turns out to be there is the question of whether it violates one of the SPEC rules, namely “Optimizations must improve performance for a class of programs where the class of programs must be larger than a single SPEC benchmark or benchmark suite”.
Without knowing the nature of the new Sun optimization it is impossible to say whether Sun should be praised or scolded. But here are the words of Sun engineer John Henning who made the following comments in a November 27 post to the comp.arch usenet news group:
“Our compiler team believes that what Sun has done with art is (1) the result of perfrectly [sic] legitimate optimizations (2) compliant with SPEC’s rules and (3) not appropriate for further discussion – if you want to figure out to make art faster, go work on it yourself, don’t ask Sun how we did it!”
With the widespread attention this incident has engendered within the industry we can presume that compiler and benchmarking experts working for Sun’s competitors have closely scrutinized the code Forte 7 generates for 179.art. The fact that Sun’s new scores haven’t been withdrawn from the SPEC official web site yet suggests that Mr. Henning is correct. No doubt we can expect competitor’s processors to score much higher on 179.art in the months and years to come as the Sun optimization migrates to other compilers. Depreciation of a benchmark’s value is seldom as spectacular as in the case of 179.art, but still naturally occurs over time and provides incentive to accelerate the development of a successor to the SPEC CPU 2000 benchmark suite (which no doubt will not include 179.art). A message soliciting programs for this new suite, tentatively named SPEC 2004, was posted on comp.arch on December 28. Ironically the author of this message, the secretary of the SPEC CPU subcommittee, is none other than the previously mentioned John Henning.
Be the first to discuss this article!