High-level optimizations II
Deals with copies to temporary variables, a = b. Compilers generate lots of copies themselves in intermediate form. Copy propagation is the process of removing them and replacing them with references to the original. It often reveals dead-code.
Inlining is a conceptually simple optimization. When a short subroutine call is found it is replaced by the contents of the subroutine. This is a complex optimization to perform for several reasons. Firstly the same function must be placed in different subroutines; different variables are assigned to registers in each subroutine. This means the only way to efficiently inline subroutines is to expand them in the context they appear each time they appear (using different registers for example). The second related problem is one of a group of “phase ordering problems”; doing one optimization before another sometimes has the combined effect of producing inferior code. Specifically code should be inlined early in the compilation process to allow further passes to perform optimizations on the inlined version. Whether inlining is a good idea depends on the final length of the inlined version, which is not known until close to the end of the process. There are several partial solutions to this problem:
- Advance compilation through the other passes and find out how much code is generated.
- Estimate from available information how much code will be generated.
- Use several inlining passes at different stages of compilation.
Using these methods reasonable inlining can be performed most of the time, at the cost of quite a lot of complexity. Note also that in C where all functions are global and may be declared in another file a non-inlined version is always needed in case this happens. It maybe subsequently removed by the linker if no such declarations occur.
The form of generated code
The forms of compiler generated code vary, but most are similar. Specifically they:
- Have a small number of registers for commonly used values.
- Perform arithmetic operations (i.e. +,-,*,/) on integers and floating point numbers.
- Apply Boolean operations, such as `or’ or `and’ to registers.
- Load and store data to addresses in memory.
- Jump to a piece of code at some address in memory.
- Make decisions, normally by jumping or not jumping on the basis of the value of a register or flag.
As far as I am aware all modern compilers assume roughly this model of a processor throughout most of their code. Extra instructions the architecture supports are either ignored, or used only in libraries or used by grouping simpler instructions prior to generating assembly language.
In order to store global variables compilers simply use static memory space. To store the return addresses of subroutines modern compilers use a stack. A stack is like one of those spring-loaded plate-warmers, if you’re reading this you probably know what I mean. Local variables, parameters to be passed, temporaries that won’t fit in registers and return values are also stored on the stack. At the beginning of a subroutine code is inserted to allocate space for them, the `activation record’. At least the parameter passing part of the activation record must be standard, if it wasn’t object code and libraries compiled with different compilers wouldn’t be compatible. Compilers utilize the registers to store values, these registers must be saved somehow. So they are put on the stack, generally there are two methods. Firstly, either the function being called saves them by storing them on it’s stack and restoring them before returning: `callee save’. Or the function doing the calling saves them, `caller save’. There are trade-offs with both. Doing too much work in callee’s makes inefficient the lowest-level functions, but there are a lot of callers too. The answer is a compromise, a group of registers are set asides as caller saved, if a subroutine is short enough to get away with using only those it need do no saving itself. The rest are callee saved, this normally includes infrequently used registers like floating point and vector registers. Code to do the saving and restoring at the beginning and end of subroutines is called the prologue and epilogue respectively. All this stuff must be specified to allow code to inter-operate this is an ABI – Application Binary Interface specification.
Whole program optimization can improve this further. The `linker’ is normally just a means of joining object files together to form an executable. Whole program optimization involves burying a whole or partial compiler in it. This allows all the calls internal to a program to be optimized in anyway the compiler sees fit. To make this possible object files are sometimes not object files at all, but a binary representation of the compilers internal representation.
Discuss (15 comments)