Assignment 2: Compiler passes

The compiler passes are described in chapter 3 of the textbook, but here they are again for completeness. We will only include passes that you have to implement. As before, the only files you should modify are the .ml files corresponding to these compiler passes.

In addition, when we provide a function stub in a .ml file that you need to complete, that means (unless otherwise specified) that we expect that you will use that function (with those arguments and types (if supplied)) as written (filling in the TODO parts, of course). In particular, you're not allowed to change the number of arguments to the function, or their types (if supplied). If a function is completely implemented (no TODOs), you should leave it as-is. On the other hand, you can write as many extra functions as you like. (If we don't like your choices, we'll let you know during code reviews!)

Note

The one exception to this rule is in the graph_coloring.ml file. We provide function stubs with TODOs as usual, as well as an implementation of the color function that uses those functions, but if you don't like this, you are allowed to completely rewrite the entire module, as long as you end up with a color function with the correct type signature which works correctly.

Passes that are unchanged from assignment 1

Since the focus of this assignment is on register allocation, those passes that come before register allocation do not have to be modified from assignment 1 (in your ch2/ directory). This includes:

uniquify.ml
remove_complex.ml
explicate_control.ml

Note that you are free to edit your old code, in case you think you have a better way of doing things than what you wrote previously, or in case you found bugs in those passes that you need to fix.

1. Select instructions (`select_instructions.ml`)

The "select instructions" pass has no fundamental changes, but there are some trivial changes because the X86Program type in x86_var.ml[i] has changed. The new version has different info types for the program and also has block info types. For this pass, the block info is a placeholder type (containing no information) called binfo1. (This will change in later passes.) Some of the type annotations will need to be changed slightly, and the blocks will all contain a Binfo1 constructor. The type checker should point out exactly where your old code needs to be changed.

2. Uncover live (`uncover_live.ml`)

The "uncover live" pass has to implement the function uncover_live with this signature:

val uncover_live :
  (X86_var.info1, X86_var.binfo1) X86_var.program ->
  (X86_var.info1, X86_var.binfo2) X86_var.program

The X86_var.program type is parameterized on two kinds of "info" types: one for the program as a whole (with variants info1, info2, and info3), and one for blocks (with variants binfo1 and binfo2). The info1 and binfo1 types are placeholder types, containing no data. All that this pass does is compute the binfo2 value, which contains the liveness information described in the textbook (section 3.2) and in the lectures. The instructions should not be changed. The binfo2 type has this definition:

type live =
  {
    initial : LocSet.t;       (* initial live variables/registers *)
    afters  : LocSet.t list;  (* live vars/regs after each instruction *)
  }

type binfo2 = Binfo2 of live

The liveness data is represented as a list of sets of locations, where a "location" is a variable, a register, or a stack location. Sets of locations are represented by the type LocSet.t. Since liveness sets apply between instructions, we use the initial field to represent the initial set of live locations, and the afters field represents, for each instruction, the set of live locations after that instruction.

Note

In principle, we could have added the liveness data directly to the instructions, but we prefer to keep it separate, partly because it's only used in this pass. It's important to be aware that there is a one-to-one correspondence between the instructions in the program and the after sets of the live record: the Nth item in the after sets is the after set for the Nth instruction in the program.

We provide an implementation for the uncover_live function, which calls the function uncover_live_in_block to compute the liveness sets. You have to implement that function. The inputs to that function are the list of instructions in the block, and live_before_labels, which is a map between block names and the live locations before that block executes. Here, there is only one such block: conclusion, with live locations (registers) %rax and %rsp.

Note

Recall that we compute liveness backwards, from the last instruction back towards the first. The live locations before the conclusion block constitute the live-after set of the last instruction in the code you compiled.

You can implement uncover_live_in_block as you like (as long as it's functional — don't use mutation!) but we recommend that you define a helper function to compute the liveness set for a single instruction given the instruction and its live-after set.

Also, be aware that callq instructions require special handling. First, executing a callq instruction can result in any of the caller-saved registers being overwritten. There is a list of caller-saved registers in the Types module as caller_saved_regs; don't hard-code the caller-saved registers! Second, callq instructions with arity (number of arguments) greater than 0 have to read from registers for the first 6 arguments. These registers are given in the Types module as arg_passing_regs; these arguments are (in order) %rdi, %rsi, %rdx, %rcx, %r8, and %r9 (all of which are caller-saved registers, which should make sense). We recommend that you write a helper function to determine the change in the liveness set just for callq instructions, incorporating all of this information.

Note

Basically, for a callq instruction, all the caller-saved registers get added to the write set, and the arg-passing registers needed (but only those that are needed) get added to the read set.

The algorithm for computing liveness sets is described in detail in section 3.2 of the textbook and in the lectures.

3. Build interference graph (`build_interference.ml`)

The "build interference" pass has to implement the function build_interference with this signature:

val build_interference :
  (X86_var.info1, X86_var.binfo2) X86_var.program ->
  (X86_var.info2, X86_var.binfo1) X86_var.program

From this, we can see that the block info binfo2 value (containing the liveness data) goes back to the placeholder binfo1 type, but the program info info1 value (containing nothing but the types of variables) gets extended into the info2 type, which has this definition:

type info2 = Info2 of
  {
    locals_types : (var * ty) list;
    conflicts    : LocUgraph.t
  }

What this does is add the interference graph information (as the conflicts field) to the program.

Note

You might wonder why the liveness data is stored in a block info type while the interference graph is stored in a program info type. The liveness data is stored in a block-by-block manner, while there is only one interference graph per function. (Here, our programs are a single main function.) Note that most functions will have multiple blocks. Ours also do (the start block plus the prelude and conclusion) but we don't have to compute the liveness data for the prelude and conclusion because they are so simple. In later compilers, our programs will have many more blocks to handle conditionals, loops and user-defined functions.

We supplied an implementation of the build_interference function which calls the make_graph function to compute the interference graph. You need to implement make_graph, following the algorithm given in section 3.3 of the textbook and in the lectures. You'll almost certainly want to split this function up into multiple helper functions. Pay particular attention to movq and callq instructions; remember that for callq you need to create an edge between all the locations in the live-after set and all caller-save registers.

Note

This is unrelated to the handling of caller-save registers in liveness analysis, which normally does nothing but is included for completeness. Here, creating the edges between call-live variables and caller-save registers is extremely important, because it prevents call-live variables from being placed into caller-save registers, where they would have to be saved and restored before and after the call. Thus, they will normally be put into callee-save registers, unless there are none available, in which case they will be placed onto the stack. The register allocation algorithm handles all of this automatically!

You might wonder if instead of spilling a call-live variable to the stack, it might be better to just put it in a caller-save register (assuming there is one) and pay the price of saving and restoring. This is a fine question, and one that preoccupies people writing optimizing compilers, but our goals are less lofty. We just want to generate reasonably good code, not the maximally optimal code.

Please review section 3.3 of the textbook before attempting this pass, since it has detailed rules for which instructions give rise to which edges in the interference graph. Also, recall that the way we are building the interference graph is to focus on writes, because the writes performed by an instruction must not overwrite something in a live location. As the book says:

So for each instruction, we create an edge between the locations being written to and the live locations. (However, a location never interferes with itself.)

One consequence of this is that instructions that don't write to a live location don't change the interference graph.

4. Graph coloring (`graph_coloring.ml`)

This module is not technically a compiler pass, but it's an integral part of the "allocate registers" pass. However, the algorithm is more general, and the implementation is not specialized to locations (registers, variables and stack locations) until the GraphColor functor (defined in this module) is specialized to locations in allocate_registers.ml. The algorithm is described in some detail in section 3.4 of the textbook as well as in the lectures.

The GraphColor functor is parameterized on several modules:

a module defining the graph element type (which will be locations in allocate_registers.ml)
a module defining the element map type (made from the MapS functor specialized on element keys); MapS is just an extension of the normal OCaml Map functor and is defined in support/functors.ml[i]
a priority queue module (defined in support/priority_queue.ml[i])
an undirected graph module (the interference graph; defined in support/ugraph.ml[i])

The functor is instantiated in allocate_registers.ml, so you don't have to worry about how to do that. Inside the functor, a type node is defined that contains information about the elements (locations) which isn't stored in the interference graph, and that changes as the algorithm progresses. The two components of a node are the color of the element (an int option, since the element is initially uncolored), and the saturation set (a set of integers, representing the colors that the element is not allowed to have). As the algorithm progresses, the colors (which mostly start as None) get filled in, and the saturation sets increase in size, which constrains the color choices.

We define a nodemap type which is a map between elements and nodes. We also provide a debugging function called _print_nodemap which can print out the nodemap at any point in the algorithm to help debugging.

Note

The name is _print_nodemap instead of print_nodemap because it isn't exported from the functor, and OCaml issues a warning about unused functions if their names don't start with an underscore.

The only function that needs to be implemented is the color function, which takes an interference graph and a map of precolored elements (an element-to-int map) as its arguments. Recall that the only precolored elements we are using are the reserved registers, which for this compiler are %rax (color -1), and %rsp (color -2). The return value of the function is the full element-to-int map, containing color assignments for all variables.

Because this is a somewhat tricky algorithm, we've provided you with a starting point. We've given our version of the color function, which calls a number of helper functions. You may choose to simply implement all the helper functions in graph_coloring.ml. Or, if you would prefer to implement the color function from scratch, you are free to delete any or all of our helper functions and rewrite the color function accordingly. (Don't change its type, though!)

Note

The only restriction we'll insist on is that your code for this module should be purely functional. Don't use ref cells or any other imperative features. In a real-world setting, you might choose to use imperative code for efficiency or convenience, but it's definitely not necessary. Also, functional code is easier to get right.

The total amount of code you need to write is less than 100 extra lines.

Testing the graph coloring algorithm

There is a very trivial test script for the graph coloring algorithm only in the file test_graph_coloring.ml. You can run it in the interactive interpreter as follows:

$ make repl
# #use "test_graph_coloring.ml";;

This will run the test and print TEST PASSED! if it passes or TEST FAILED! if not. More importantly, since this is done interactively, you can inspect the results of graph coloring to see how they differed from the expected results. The example given is the same one given in the book and in the lectures.

Note

In future, we plan to have much more comprehensive tests of this algorithm. If you are looking for a topic for a CS 81 project, this course is a gold mine! In addition to more tests, there are lots of features we want to add.

5. Allocate registers (`allocate_registers.ml`)

Once you've implemented the graph coloring algorithm, the "allocate registers" pass is simple. You need to do these things:

Run the graph coloring algorithm and assign registers (or, if there are no available registers, stack locations) to all variables. This is done in the functions get_variable_location_map, which is supplied to you, and varmap_of_colormap, which you need to fill in. Your implementation of varmap_of_colormap should use the helper function location_of_color, which you also need to fill in.

Note

The location_of_color function should make use of the last_register_color variable. This gives the index of the last register that can be used for register allocation. It's set in the set_register_color_list function, and will correctly track the -regs argument of the compiler. Don't use the size of register_color_list or of the maps computed by it, since they also contain the reserved registers (with negative color numbers).

Also note that last_register_color is an int ref, not an int, so you will need to dereference it to get its value. Please don't set it, though!
Use the mapping between variable bindings and locations (registers or stack locations) to determine the number of stack slots needed, if any. This is done in the function get_num_spilled, which you need to fill in. Be careful that you don't count a stack location more than once! Hint: sets, specifically LocSets, are your friend!
Use the same mapping between variable bindings and locations to collect a set of all the callee-save registers used in the program. This is done in the function get_used_callee, which you need to fill in.
Convert all the instructions that use variables to use the corresponding registers or stack locations. This is done in the function convert_block, which you need to fill in. You'll probably want to define some helper functions in your implementation of this function.

All the other code for this pass has been supplied. In particular, code for selecting which registers are available (based ultimately on the command-line arguments to the compile program) is supplied and should not be altered.

6. Patch instructions (`patch_instructions.ml`)

Aside from trivial type changes, the only significant change in this pass is that instructions of the form Movq (x, x) (moving a value from a location to the same location) are removed.

7. Prelude and conclusion (`prelude_conclusion.ml`)

This pass has a few changes from the corresponding pass in the last assignment.

The info record type attached to the program type in x86_int.mli has these fields:

locals_types : (var * ty) list;
num_spilled  : int;
used_callee  : RegSet.t

We won't be concerned with locals_types, but the other two are relevant to generating a correct prelude and conclusion. num_spilled is the number of stack slots needed for variables that were not placed in registers. used_callee are the names of the callee-save registers that are used in the program.

Recall that the program is one big function (the main function), and so the values of any callee-saved variables that are used inside this function must be saved in the prelude and restored in the conclusion. They will be saved to the stack and restored from the stack. Therefore, the total number of stack slots needed can be more than num_spilled; it's actually num_spilled plus the number of callee-save registers used. And to make it even more confusing, the size of the stack frame (the total amount of stack space reserved for the function) must be a multiple of 16 bytes, so it can be even bigger!

Fortunately, we have defined a function called align_16 in the Utils module of the support directory. This will take a non-negative integer and round it up to the nearest multiple of 16. So the total stack space required is actually (in pseudocode):

stack_space = align_16(8 * (num_spilled + size(used_callees)))

However, it's still a bit more complicated than this. We will be saving the callee-save registers by pushing them to the stack with the pushq instruction, and restoring them with the popq instruction. The pushq instruction subtracts 8 bytes from the stack pointer, and the popq adds it back. The effect of this is that, after all the callee-save registers have been pushed to the stack using pushq instructions, we've already allocated a big chunk of the total stack space required. The additional stack space we need to allocate can be computed from these equations (in pseudocode):

n_callees = size(used_callees)
extra_stack_space = align_16 (8 * (num_spilled + n_callees)) - 8 * n_callees

This is already computed for you in the prelude_conclusion function.

From here, you have to do two things:

Compute the prelude and conclusion (collectively called the "epilog" in the code because we place it after the code in the start block). This is done in the epilog function which takes the set of callee-save registers and the amount of extra stack space needed as arguments.
Adjust all instructions in the start block that access stack locations, because the callee-save registers on the stack use up space at the beginning of the stack frame, which means that any stack location access which is relative to the base pointer in %rbp has to be changed to account for this extra space. This is done in the asm_of_lb function, which takes a labelled block (the start block, in this case) and the amount of space that stack locations need to be shifted by (called deref_adjust) as its arguments. You'll probably want to define helper functions to write this function, as (we hope) you did in the last assignment.

The prelude_conclusion function appends the epilog to the start block code to generate the final program.

Here are a couple of notes about the functions you need to write.

For asm_of_lb, all you need to do is go through all instructions, changing any stack location accesses by subtracting the deref_adjust value.
For epilog, after you've established the base pointer in the prelude, you need to push the callee-save register values onto the stack, and then adjust the stack pointer by subtracting the extra space needed for the stack-resident variables before jumping to the start block. (Recall that the stack grows downwards, so subtracting from the stack pointer reserves stack space.) In the conclusion, you have to add the extra space for the stack-resident variables back to the stack (reclaiming the stack space), and then pop the callee-save register values back to their registers before resetting the base pointer and returning. Make sure that you pop the callee-save registers in reverse order to the order in which you pushed them! (Remember, the stack is a last-in, first out (LIFO) data structure.) Also, only push and pop the callee-save registers you actually used; if your program didn't use any, you don't have to do any pushing and popping.

This sounds much more complicated than it is! This pass is really just a little bit of (somewhat annoying) bookkeeping.

Assignment 2: Compiler passes

Passes that are unchanged from assignment 1

1. Select instructions (select_instructions.ml)

2. Uncover live (uncover_live.ml)

3. Build interference graph (build_interference.ml)

4. Graph coloring (graph_coloring.ml)

Testing the graph coloring algorithm

5. Allocate registers (allocate_registers.ml)

6. Patch instructions (patch_instructions.ml)

7. Prelude and conclusion (prelude_conclusion.ml)

1. Select instructions (`select_instructions.ml`)

2. Uncover live (`uncover_live.ml`)

3. Build interference graph (`build_interference.ml`)

4. Graph coloring (`graph_coloring.ml`)

5. Allocate registers (`allocate_registers.ml`)

6. Patch instructions (`patch_instructions.ml`)

7. Prelude and conclusion (`prelude_conclusion.ml`)