Skip to content

Assignment 1: The Var language

Overview

This assignment involves writing a compiler for the Var language, which is a simple language of variables and arithmetic described in chapter 2 of the textbook and in the lectures.

The end result will be a program called compile which will be able to compile a source program (with the extension .src) to assembly language. In addition, the compiler can stop after any pass, including after the parser, so you can inspect the generated code at each stage. The compiler also has the ability to run a single pass given the appropriate inputs.

Textbook coverage

This assignment is based on chapter 2 of Essentials of Compilation.

Due date

This assignment is due on Friday, October 18th at 6 PM.

Before you begin

Make sure that all the action items from assignment 0 have been completed:

  • choosing a partner
  • installing OCaml and all the necessary OCaml libraries
  • setting up a GitHub repository and adding the instructor (Mike) and your grader (one of the TAs) as collaborators 1

Division of labor

These assignments are very much a team effort, and both students will be credited for all work done. We don't want one student to do all the work, though. Learning how to divide the labor is an interesting challenge, and there is more than one right way to do it. Some teams may want to work on the code together ("pair programming"), which can be highly effective. Alternatively, you can divide up the passes. Note that each pass can be tested individually.

If one team member is more experienced than the other, they may want to tackle the harder passes (which will be identified below). The hardest passes (of which there are none in this assignment!) are the best candidates for collective work.

README or README.md file

Create a README or README.md file in your ch2 directory, and in it, identify which person wrote which passes. (If both partners worked on a pass, indicate that too.) Also, if you used any late days on the assignment, indicate how many late days you used.

"Submitting" your assignment

Unlike most courses, there is nothing to "hand in" in this course. Instead, you need to inform the instructor and your TA/grader when an assignment is ready to be graded (hopefully, before or on the due date). The instructor/TA will check out your code, run the tests, and leave comments in a file called GRADES in your ch2 directory.

After the grading comments have been checked in to your repository, you have one week to make changes. Work redone during this period will be re-evaluated without penalty.

In addition, you can use one of your 10 late days to submit your initial work late. See the late policies in the syllabus. If you do use late days, please indicate this in your README file (see below).

Code reviews, office hours, and feedback

We will be setting up code review times for each team. Make sure you choose a time where both members of the team can meet as well as the instructor (and/or a TA).

Code reviews do not require that all of the code be written, or that all of the code is working perfectly, but there is no point in doing a code review unless most of the code has been written. If you are having trouble at an earlier stage, we will have office hour times you can come to.

Textbook

This assignment is based on chapter 2 of the course textbook (Essentials of Compilation by Jeremy Siek). Please read this chapter in its entirety before doing this assignment.

One obvious and pervasive difference between the textbook and this course is that we are using OCaml to write the compiler, whereas the book uses Racket. Another is that we aren't asking you to write your own test cases, though you are encouraged to do that in addition to the ones we supply if you find it helpful.

Also, you'll notice little differences in the way languages are represented. The Racket code has a Prim constructor that we don't use (yet), and an info field that we only use in certain languages. However, for the most part there is a close to 1-to-1 correspondence between the Racket datatypes and the datatypes we define in OCaml. When in doubt, trust the OCaml code.

Finally, in the book, there is partial "skeleton" code for some of the passes. While the OCaml equivalent is broadly similar, you don't need to have an error case for unhandled cases, because OCaml pattern matching is exhaustive! When skeleton code in Racket is included in the book, you are welcome to use the Racket code as a basis for writing your own OCaml code. (Of course, you'll have to translate the Racket code to OCaml!) Learning Racket is not a requirement for this course, but you should be able to pick up enough of it to understand most of the code examples in the book.

Starting code base

The starting code base is in two zip files:

  • src.zip
  • ch2.zip

which are posted on the course Canvas site in the "Assignment code" module.

src.zip is the starting code base for the entire course. You should unzip this file in your Github repo. When unzipped, it will create a src/ directory which will eventually contain all of your compiler code for the course. Initially, it just contains the file dune-project (which has to be at the base of any OCaml project that uses the dune compilation manager, as we will do), the .gitignore file (which tells git which files don't need to be under version control), and the support/ directory. This directory contains a variety of modules which contain useful functions and data structures. We'll give you suggestions on which functions you should consider using, but you can use any of them at any time.

ch2.zip is the code base for assignment 1 (the Var language compiler). You should move it into the src/ directory and unzip it in that directory. This will create the ch2/ directory. Inside this directory will be the .ml and .mli files of the compiler, a few other files (Makefile, dune, utop_init, etc.) whose purposes will be described below, and three subdirectories:

  • tests/ — this contains all the test code in the source language of the compiler e.g. var_test_1.src;

  • scripts/ — this contains the Python test scripts that you will use to test your compiler;

  • reference/ — this contains the output of the reference compiler (the instructor's compiler) for each test file and each pass; this is used for testing as described later in the assignment.

The name ch2 refers to the fact that this code corresponds to the language in chapter 2 of the textbook.

In future assignments, we will be giving you a zip file containing only the code which is specific to the new compiler, in a directory called e.g. ch3, ch4, etc. The support library should not change, (unless there are bugs which need to be fixed).

You should check in the entire src directory and all of its subdirectories, including the .gitignore files. However, don't check in the zip files! (You should probably remove them once you don't need them anymore.)

Sanity checking the code base

If you've installed the code base correctly, you should be able to do the following:

  1. cd into the src/ch2 subdirectory.

  2. Type make. This will compile the compiler (an executable file called compile). You should see a number of warnings when you compile the compiler; that's expected. (As you fill in the code for the compiler passes, these warnings will go away).

  3. You can use the compiler as-is to convert source files in the tests/ subdirectory to their "Lvar" AST equivalents. ("Lvar" is the name of the abstract syntax tree (AST) language for the Var compiler.) For instance:

    $ ./compile tests/var_test_1.src -pass lvar
    (Program (Int 42))
    

    Note that the output of the compiler is printed to the terminal. Should you want to save the output to a file, you can redirect it:

    $ ./compile tests/var_test_1.src -pass lvar > var_test_1.lvar
    $ cat var_test_1.lvar
    (Program (Int 42))
    

    You can also run the "Lvar" evaluator:

    $ ./compile tests/var_test_1.src -pass lvar -eval
    42
    

    However, if you try to compile to a pass beyond the AST, it will fail:

    $ ./compile tests/var_test_1.src -pass un
    TODO
    

    The TODO is the error message that indicates that some part of the compiler needs to be written.

  4. You can also print out the compiler options:

    $ ./compile --help
    

    (or just compile with no arguments). This prints out a usage message.

If everything works as we've described, you are ready to start work on the assignment.

Note that each assignment's compiler will be different, and there may be different or additional command-line options for each compiler.

README or README.md file

Create a README or README.md file in your ch3 directory, and in it, identify which person wrote which passes. (If both partners worked on a pass, indicate that too.) Also, if you used any late days on the assignment, indicate how many late days you used.


  1. This assumes that a grader has been assigned to your team. If not, add the grader to your Github repo as soon as you know who they are.