The General Structure of the Compiler

Our compiler has the "classic" structure, and comprises the following parts.

The source character stream (which is often called the concrete program) is read and broken up into tokens by the scanner.

Then the token sequence is analyzed by the parser to produce an abstract syntax tree (which is often called the abstract program).

The abstract program is further translated by the code generator into a program in assembly language. A program in assembly language is very close to the target program, except that, instead of concrete cell addresses, it contains labels, each label representing some (yet) unknown address.

The program in assembly language is then processed by the assembler, which replaces all the label with concrete addresses, thereby producing the target machine code program.

The information about the correspondence between the variable names and labels is kept in the dictionary of variables. Thus the compiler contains a module dealing with the dictionary, which is used by the code generator as well as by the assembler.

In comparison with the simplicity of the source language, the structure of our compiler may well seem to be rather complicated. And, actually, the compiler could have been simplified by merging many compiler's components together. For example, this could have been done with the scanner, parser, and code generator.

It should be kept in mind, however, that, should the source language be more complicated, such "unionism" would make the compiler messy, unreliable and difficult to understand. But, the purpose of our compiler is just to illustrate, in the framework of Refal Plus, the traditional compiler writing techniques applicable to "real-size" compilers.

Taking our example compiler as the starting point, the reader may try to improve it in two respects. First, the source language can be made more complex and more realistic. Second, the compiler can be simplified at the expense of making it less "scientific" and less general.