Implementation language
In principle, any programming language can be used to write a compiler. Many compiled languages, after "bootstrapping" their compiler from another language, then rewrite the compiler in the language being compiled (this is called "self-hosting"). Self-hosting serves as a test of the power of the language (any language that is powerful enough to write its own compiler has to be pretty powerful) and allows the language developers to only use the new language in later development (often called "eating one's own dog food", or "dogfooding" for short).
In this course, we will use the OCaml programming language exclusively. Out of all the possible choices of programming language, why did we choose this comparatively obscure language for writing our compilers?
Part of the reason is subjective: we like OCaml as a language, we are very familiar with it, and we find that it occupies a useful "sweet spot" between lower-level languages like C, Java and Python and very high-level languages like Haskell. Its functional programming support is useful in reducing code duplication, and its impurity means that you can also use imperative code when it's desirable to do so.
The main reason, though, is that OCaml is a language which is superbly well-adapted to the task of writing a compiler.1 Features that are particularly helpful include:
-
Very strong static type checking. Compilers are large, complicated and subtle programs, and we need all the help we can get in preventing and tracking down errors. OCaml's type checking is extremely stringent, to the point that if a program compiles at all, there is a very good chance that it will run correctly. More importantly, type checking rules out almost all trivial errors.
-
Algebraic datatypes. These map almost perfectly to the structure of a compiler's various intermediate languages, and provide an extremely lightweight way to specify new datatypes.
-
Garbage collection. Compilers are complicated enough without having to worry about memory leaks from manual memory management.
-
A fast compiler. Waiting for your compiler to compile isn't fun, and our compilers will compile almost instantaneously.
-
An excellent build system. The dune build system makes compiling OCaml projects nearly effortless.
-
An excellent package manager. The opam package manager makes installing new packages completely painless.
-
Excellent serialization. OCaml has the facility (not built-in, but provided by libraries) to convert any complex datatype to an "S-expression" (a simple and readable serialization format). Since compilers consist of a series of code transformations, being able to easily view the output of each transformation is incredibly useful for debugging.
There are other useful features of OCaml we will encounter as we proceed.
One indication of how useful OCaml is as a language for writing compilers is to consider which languages' compilers and other language-related tools were originally written in OCaml.2 These include:
-
the Rust language compiler
-
the Hack language compiler (a statically-typed PHP dialect from Facebook)
-
the Flow static analyzer for Javascript from Facebook
-
the Infer static analyzer for Java/C/C++/Objective-C from Facebook
-
the pyre-check static analyzer for Python from Facebook
-
ReasonML, an alternative syntax for OCaml which compiles to Javascript (also from Facebook!)
-
Frama-C, a static analysis framework for the C language
-
The WebAssembly language reference implementation (WebAssembly is a low-level language for running code in a web browser) 3
-
the Coq proof assistant (take CS 128 to learn about Coq)
and no doubt many more. OCaml is an excellent tool for building language-related tools, including compilers.
-
Jason Hickey, the last person to teach compilers at Caltech before this course, described OCaml as "a near-perfect language for writing a compiler". ↩
-
In some of these cases, the compiler was eventually rewritten in the language being compiled, so that the language could be self-hosted. ↩
-
Technically, the WebAssembly implementation is a reference interpreter, which is a language evaluator/runtime instead of a compiler, but it still illustrates the usefulness of OCaml for language implementations. ↩