Assignment 1: The Var language
Overview
This assignment involves writing a compiler for the Var language, which is a simple language of variables and arithmetic described in chapter 2 of the textbook and in the lectures.
The end result will be a program called compile
which will be able to
compile a source program (with the extension .src
) to assembly language.
In addition, the compiler can stop after any pass, including after the parser,
so you can inspect the generated code at each stage. The compiler also has the
ability to run a single pass given the appropriate inputs.
Textbook coverage
This assignment is based on chapter 2 of Essentials of Compilation.
Due date
This assignment is due on Friday, October 18th at 6 PM.
Before you begin
Make sure that all the action items from assignment 0 have been completed:
- choosing a partner
- installing OCaml and all the necessary OCaml libraries
- setting up a GitHub repository and adding the instructor (Mike) and your grader (one of the TAs) as collaborators 1
Division of labor
These assignments are very much a team effort, and both students will be credited for all work done. We don't want one student to do all the work, though. Learning how to divide the labor is an interesting challenge, and there is more than one right way to do it. Some teams may want to work on the code together ("pair programming"), which can be highly effective. Alternatively, you can divide up the passes. Note that each pass can be tested individually.
If one team member is more experienced than the other, they may want to tackle the harder passes (which will be identified below). The hardest passes (of which there are none in this assignment!) are the best candidates for collective work.
README
or README.md
file
Create a README
or README.md
file in your ch2
directory,
and in it, identify which person wrote which passes.
(If both partners worked on a pass, indicate that too.)
Also, if you used any late days on the assignment,
indicate how many late days you used.
"Submitting" your assignment
Unlike most courses, there is nothing to "hand in" in this course.
Instead, you need to inform the instructor and your TA/grader
when an assignment is ready to be graded
(hopefully, before or on the due date).
The instructor/TA will check out your code, run the tests, and leave comments
in a file called GRADES in your ch2
directory.
After the grading comments have been checked in to your repository, you have one week to make changes. Work redone during this period will be re-evaluated without penalty.
In addition, you can use one of your 10 late days
to submit your initial work late.
See the late policies in the syllabus.
If you do use late days, please indicate this in your README
file
(see below).
Code reviews, office hours, and feedback
We will be setting up code review times for each team. Make sure you choose a time where both members of the team can meet as well as the instructor (and/or a TA).
Code reviews do not require that all of the code be written, or that all of the code is working perfectly, but there is no point in doing a code review unless most of the code has been written. If you are having trouble at an earlier stage, we will have office hour times you can come to.
Textbook
This assignment is based on chapter 2 of the course textbook (Essentials of Compilation by Jeremy Siek). Please read this chapter in its entirety before doing this assignment.
One obvious and pervasive difference between the textbook and this course is that we are using OCaml to write the compiler, whereas the book uses Racket. Another is that we aren't asking you to write your own test cases, though you are encouraged to do that in addition to the ones we supply if you find it helpful.
Also, you'll notice little differences in the way languages are represented.
The Racket code has a Prim
constructor that we don't use (yet),
and an info
field that we only use in certain languages.
However, for the most part there is a close to 1-to-1 correspondence
between the Racket datatypes and the datatypes we define in OCaml.
When in doubt, trust the OCaml code.
Finally, in the book, there is partial "skeleton" code for some of the passes. While the OCaml equivalent is broadly similar, you don't need to have an error case for unhandled cases, because OCaml pattern matching is exhaustive! When skeleton code in Racket is included in the book, you are welcome to use the Racket code as a basis for writing your own OCaml code. (Of course, you'll have to translate the Racket code to OCaml!) Learning Racket is not a requirement for this course, but you should be able to pick up enough of it to understand most of the code examples in the book.
Starting code base
The starting code base is in two zip files:
src.zip
ch2.zip
which are posted on the course Canvas site in the "Assignment code" module.
src.zip
is the starting code base for the entire course.
You should unzip this file in your Github repo.
When unzipped, it will create a src/
directory
which will eventually contain all of your compiler code for the course.
Initially, it just contains the file dune-project
(which has to be at the base of any OCaml project
that uses the dune
compilation manager, as we will do),
the .gitignore
file
(which tells git
which files don't need to be under version control),
and the support/
directory.
This directory contains a variety of modules
which contain useful functions and data structures.
We'll give you suggestions on which functions you should consider using,
but you can use any of them at any time.
ch2.zip
is the code base for assignment 1 (the Var language compiler).
You should move it into the src/
directory
and unzip it in that directory.
This will create the ch2/
directory.
Inside this directory will be the .ml
and .mli
files of the compiler,
a few other files (Makefile
, dune
, utop_init
, etc.)
whose purposes will be described below,
and three subdirectories:
-
tests/
— this contains all the test code in the source language of the compiler e.g.var_test_1.src
; -
scripts/
— this contains the Python test scripts that you will use to test your compiler; -
reference/
— this contains the output of the reference compiler (the instructor's compiler) for each test file and each pass; this is used for testing as described later in the assignment.
The name ch2
refers to the fact that this code corresponds
to the language in chapter 2 of the textbook.
In future assignments, we will be giving you a zip file containing only
the code which is specific to the new compiler, in a directory called
e.g. ch3
, ch4
, etc. The support
library should not change,
(unless there are bugs which need to be fixed).
You should check in the entire src
directory and all of its subdirectories,
including the .gitignore
files.
However, don't check in the zip files!
(You should probably remove them once you don't need them anymore.)
Sanity checking the code base
If you've installed the code base correctly, you should be able to do the following:
-
cd
into thesrc/ch2
subdirectory. -
Type
make
. This will compile the compiler (an executable file calledcompile
). You should see a number of warnings when you compile the compiler; that's expected. (As you fill in the code for the compiler passes, these warnings will go away). -
You can use the compiler as-is to convert source files in the
tests/
subdirectory to their "Lvar" AST equivalents. ("Lvar" is the name of the abstract syntax tree (AST) language for the Var compiler.) For instance:Note that the output of the compiler is printed to the terminal. Should you want to save the output to a file, you can redirect it:
$ ./compile tests/var_test_1.src -pass lvar > var_test_1.lvar $ cat var_test_1.lvar (Program (Int 42))
You can also run the "Lvar" evaluator:
However, if you try to compile to a pass beyond the AST, it will fail:
The
TODO
is the error message that indicates that some part of the compiler needs to be written. -
You can also print out the compiler options:
(or just
compile
with no arguments). This prints out a usage message.
If everything works as we've described, you are ready to start work on the assignment.
Note that each assignment's compiler will be different, and there may be different or additional command-line options for each compiler.
README
or README.md
file
Create a README
or README.md
file in your ch3
directory,
and in it, identify which person wrote which passes.
(If both partners worked on a pass, indicate that too.)
Also, if you used any late days on the assignment,
indicate how many late days you used.
-
This assumes that a grader has been assigned to your team. If not, add the grader to your Github repo as soon as you know who they are. ↩