The Tree
Tree (actually abstract syntax tree) is the data structure used to represent the meaning of the code in the next few stages of the compiler. When a parser matches the left-hand side of a production it adds something to the tree. Here is a little bit of tree.

This tree is created in memory. Every part of the left-hand side of a production produces a pointer. The right hand side gets some memory, stores the instruction, instruction type (eg. operation, variable, function call), and the pointers it recieves. It then passes a pointer to this structure to its left-hand side. In the predictive parser this roughly means that each subroutine gets pointers from every subroutine it calls, collects them in memory, adds the instructions to be performed and returns a pointer to them.

Trees can be easily written using brackets. Each left bracket represents a new node and each right bracket the end of that node. The tree above could be written as (+ a (* b 2)). This implies that the entire process of parsing can be omitted and all that is needed is a language where every concept is de-marked with brackets. A language has been written like this: Lisp. Lisp is an acronym for LiSt Processing Language, but it is more accurately called TRee Processing language. However, John McCarthy invented in the 50’s where he would never have got away with calling it TRIP. In Lisp, linked-lists are a fundamental data type, which means that data has the same form as code. In fact in Lisp there is a function call `eval’ that passes a list to the interpreter for execution, so it is possible to construct small sections of code and execute them, or allow the user to input code. It also makes it easy to recognize or generate Lisp.Once the tree is generated it is then ‘decorated’ or ‘annotated’. This means adding type information to it and quite possibly other information useful to optimization. This stage finds type mismatches. Some languages stop at this stage and interpret the resulting tree without compiling. Perl does this as do some lisp interpreters. Also, languages that generate images rather than code, such as HTML and XML, sometimes build those images from trees.
A common Linux kernel problem is: “I keep getting internal compiler errors when building the kernel with make -j”. The reason for this, as every F.A.Q. replies, is that system memory is deliberately overclocked or of low-quality, and this is the most common symptom. This demonstrates something of the nature of compiling; huge complex data structures being read, annotated and replaced. Also several simultaneously running compilers is a good test of system memory.
Pages: « Prev 1 2 3 4 5 6 Next »
Discuss (15 comments)