This is the second chapter of series “Let’s Build a C Compiler”, In this chapter we will have an overview of the structures of our compiler.
As we said in previous chapter, we promise a “Compiler”, but it is actually an “Interpreter”. That means we can run a C source file just like a script, we did this mainly for two reasons:
- Interpreter is different with Compiler only in code generation phase. They are the same in lexical analysis and parsing.
- We will build our own virtual machine and assembly instructions, that would help us better understanding the how computer works.
Normally, there are three phases of compiler parsing a source file:
- lexical analysis: that converts source strings into internal token stream.
- parsing: that consumes token stream and constructs a syntax tree.
- code generation: walk through the synatx tree and convert it to target code.
Compiler Construction had been so mature that part 1 & 2 can be done by automation tools. For example, flex can be used for lexical analysis, bison is for parsing. They are powerful but do thousands of things behind the scene. In order to fully understand how to build a compiler, we are going to build all of them from scratch.
- Build our own virtual machine and instruction set. This is the target code that will be using in our code generation phase.
- Build our own lexer for C compiler.
- Write a recusion descent parser on our own.
Taken from c4, our compiler includes 4 main functions:
next()for lexical analysis; get the next token; will ignore spaces tabs etc.
program()entry for parser.
levelwill be explained in later chapter.
eval()the entry of our virtual machine; used to interpret target instructions.
expression exist when we have
program for parser? It is because
the parser for expressions is relatively independent and complex, so we put it
into a single mmodule(function).
Then our code looks like:
The code might seems to be a lot for the very first chapter. But if you are familiar with C, you’ll find it simple enough. The code above reads a source file, character by character, and output them.
The most important thing here is to understand the meaning of these functions, they represents the whole framework of a interpreter. We’ll implement all of them step by step in the following chapters and finally a whole compiler.
The code for this chapter can be downloaded from Github, or clone by:
Note that I might fix bugs later, and if there is any incosistance between the
artical and the code branches, follow the article. I would only update code in
With simple coding, we have the simplest compiler: a do-nothing compiler. In
next chapter, we
will implement the
eval function, i.e. our own virtual machine