Pages: 1 2
Intel’s Compiler is the performance leader on most of the platforms it supports. This isn’t all that surprising since Intel creates the microarchitecture of the processors it targets; giving Intel’s software engineers a head start. The same is true of the compiler teams associated with other chipmakers. But unlike the others, Intel deals with x86; the biggest market, so other compilers have good reason to try to keep up. It is also good to keep in mind that at the time when x86 was designed, easy assembly programming was a key attribute of new ISAs. In comparison, RISC processors were designed to be easy compiler targets and IA64 is supposed to be a very flexible and powerful compilation target. As a result, creating good compilers for x86 is harder than for many other ISAs.
Also worth noting is the discussion of PGO and different code types. Gary described the two types of code as branch-oriented and loop-oriented. It is clear that in-order processors have a bit of difficulty with branch-oriented code, because they lack the ability to rearrange the instructions to fill all potential execution slots. Moreover, branch mispredicts are much more common and harder to hide on an in-order MPU. However, on loop-oriented code, such as matrix scaling, in-order scheduling is fine, since the processor can just pipeline the multiplication in a natural way; also, loop oriented code tends to have far fewer branch mispredicts (typically these occur at the beginning and end of a loop). Gary explained that PGO has a negligible effect on loop-oriented code, but can significantly benefit branch-oriented code. In this sense, PGO is a perfect complement to an in-order design; each excels at the other’s weak point. This explains why PGO seems to be so effective on IA64, now all that is needed is a way to make PGO painless for software and compiler developers alike.
It is unsurprising that Intel has implemented profile guided optimization; it’s not terribly difficult, though it involves a lot of code. The interesting thing is that they have done such a good job on low-level optimizations that it has become important to do so.
There are many individuals at Intel to thank for this piece; unfortunately, we only know several by name. We would like to thank George Alfs for setting up the initial interview, and Gary Carleton for bravely volunteering to answer our questions. Mary-Ellin Brooks helped us with the final stages of reviewing the article and helped us get some last minute questions answered by Gary. Lastly, I’d like to thank all three Intel compiler teams (ARM, IA32 and IA64) for putting the tools out there for us to use.
Discuss (43 comments)