Compilers – The Basics

Pages: 1 2 3 4 5 6

The Compiler

Since the Pentium 4 was launched its poor performance on unoptimized code has triggered interest in compiler technology. This article is an introduction to the form and function of compilers.

What is a compiler?

Programmers think of a compiler as a program that takes input in the form of a high level language and produces output in the form of assembly language (or machine code) for some processor. This is actually too limited of a definition: a compiler by the purest definition takes a string and outputs another string. This covers all manner of software; text formatters such as TeX and troff convert an input language into a printable output, such as postscript. Programs that convert between file formats or different programming languages are compilers. Many interpreted languages such as Python, Forth and Smalltalk compile internal bytecode and interpret it. Many other programs use compiler like structures to interpret configuration files. Also, pretty printers, indentation & coloring in code editors and static type checkers (e.g. lint) use techniques similar to compilers. Have I mentioned web browsers yet? In this article, however, I will be discussing what is normally regarded as a compiler, as these are the most sophisticated and interesting.

When high level languages were first invented in the forties and fifties no compilers had been written. Early compilers were complex and took huge amounts of time and manpower to write. Since then work on methods and tools have made it possible for a single programmer to write quite an advanced compiler. One of the main lessons learnt is how to split a compiler into parts. At the highest level there are three parts: the front end that understands the syntax of the source language, the mid-end that performs high level optimizations and the back end that produces assembly language.

At a slightly lower level there are seven stages. Each of these stages will be explained here. Not all compilers have all these stages, but most have at least five or six of them. The information is passed between these stages in the data structures marked between the blocks.


Pages:   1 2 3 4 5 6  Next »

Discuss (15 comments)