Assignment 3: Compiler passes

The compiler passes are described in chapter 4 of the textbook and in the lectures, but here they are again for completeness. (You should still read the book for a more in-depth explanation!) We will only include passes that you have to implement. As before, the only files you should modify are the .ml files corresponding to these compiler passes.

1. Shrink (`shrink.ml`)

This is a new pass. Its purpose is to remove the and and or forms by translating them to if expressions using this translation:

(and e1 e2) --> (if e1 e2 #f)
(or e1 e2)  --> (if e1 #t e2)

This pass converts from the Lif language to the Lif_shrink language. The only difference between them is that in Lif_shrink, the And and Or constructors have been removed.

2. Uniquify (`uniquify.ml`)

This is essentially the same as the "uniquify" pass for the Var language compiler, except that you have to add a case for the If constructor, and all the operators have been combined into the Prim constructor.

3. Remove complex operands (`remove_complex.ml`)

The "remove complex operands" pass is essentially the same as in the Var compiler, except for the necessary changes to accommodate the new language forms and the folding of operator expressions into the Prim constructor. Note that all three operands of an if expression can be complex, so don't convert them to atoms.

4. Explicate control (`explicate_control.ml`)

The "explicate control" pass has significant changes from the previous compiler. This pass converts code in the "Lif_mon" language to the "Cif" language. The "Cif" language is like "Cvar" from the previous compiler, but with these changes/additions:

a Bool constructor for boolean literals
operator expressions using the Prim constructor used in the "Lif" languages
extra tail constructors:
- Goto for unconditional jumps
- IfStmt for conditional jumps

The most significant change are conditional jumps represented by the IfStmt constructor of the tail datatype, because the code can then go in one of two directions. The IfStmt constructor looks like this:

IfStmt of {
  op : cmp_op;
  arg1 : atm;
  arg2 : atm;
  jump_then : label;
  jump_else : label;
}

This corresponds to a comparison operator which jumps to one label if the comparison returns "true" (#t) or to the other label if the comparison returns "false" (#f). The labels correspond to blocks of instructions, so the entire code is represented as a set of labelled blocks, which we refer to as "basic blocks".

Note

A "basic block" is a sequence of instructions that has no internal jumps i.e. if there are any jump instructions (like Goto or IfStmt in the "Cif" language) they only appear at the end of the block.

With code represented as a set of basic blocks connected by jumps, the code representation is abstractly a graph, not a tree. In particular, code in the "Cif" language corresponds to a directed acyclic graph (DAG), which we will discuss below.

The "explicate control" pass from the previous (Var) compiler has these functions:

a function to convert atoms (convert_atom)
a function to convert expressions (convert_exp)
a function to convert assignments (explicate_assign)
a function to convert expressions in "tail position" (explicate_tail)

All of these are straightforward functions to implement.

For this compiler, we need to extend these functions to deal with the new language constructs, and we need to define an additional one for conditional expressions (explicate_pred).

The convert_atom function needs to be extended to deal with the new Bool constructor, which is trivial.

The convert_exp function needs to handle operator expressions differently to reflect the new Prim constructor. Neither the Let nor the If constructor of the "Lif_mon" language should be handled here, so signal an error if you get one of these as an input.

The explicate_assign function has to deal with if expressions in let bindings. This is essentially this transformation:

(let (y (if e1 e2 e3)) tl)
-->
(if e1 (let (y e2) tl) (let (y e3) tl))

except that the tails (tl) are not duplicated. This can be achieved by converting the tail to a block using the (provided) create_block function and then using a Goto to that block for each tail. To convert the if expression itself requires the explicate_pred function, described below.

Note

If the tail is already a Goto, the create_block function returns it unaltered, as you would expect.

The explicate_tail function needs to be extended to handle if expressions. This requires recursive calls to itself and also requires a call to explicate_pred.

Now we get to explicate_pred, which is the most complicated function of the pass. It's the function that is called to convert if expressions, as we've seen. The arguments are:

a boolean-valued test expression (an "Lif_mon" expression)
converted ("Cif") tails for the "then" and "else" clauses of the if expression

The most straightforward case is where the test expression is an operator expression using a comparison operation. In that case, an IfStmt tail can be generated immediately.

Pattern matching

There is a trick you can use in pattern-matching so that you only match operators which are comparison operators. It takes advantage of the fact that the operators are represented as polymorphic variants. The pattern match case looks like this:

| Prim (#cmp_op as op, [a1; a2]) -> ...

This will match a Prim constructor whose operator is a comparison operator, and give it the local name op. The #cmp_op syntax says to take the subset of the operators (type core_op) which match the cmp_op polymorphic variant datatype. (Both the core_op and the cmp_op types are defined in types.ml.)

Another simple case is when the test expression is a literal boolean. In that case, either the "then" tail or the "else" tail can be returned immediately (depending on the boolean's value). Negated literal booleans can be handled similarly. (This is an example of partial evaluation.)

If the test case is a boolean-valued variable, this can be converted to an equality comparison: v becomes (eq v #t).

This leaves two trickier cases, both of which can be described in terms of the transformations needed to compile them.

The test expression is a let, which means that the original if expression is a let inside an if. This is compiled using this transformation:
```
(if (let (x e1) e2) then_tl else_tl)
-->
(let (x e1) (if e2 then_tl else_tl))
```
Of course, the let output ends up becoming an Assign statement in the "Cif" language.
The test expression is another if expression, which means that the original if expression is an if inside an if. This is compiled using this transformation:

(if (if e1 e2 e3) then_tl else_tl)
-->
(if e1 (if e2 then_tl else_tl) (if e3 then_tl else_tl))

It's important not to duplicate the tails, so convert them to labels using the create_block function. This will also require recursive calls to explicate_pred.

One curious aspect of this pass is that we use imperative programming. We have a global reference to a map between labels and blocks to keep track of the basic blocks. We could have done this using extra arguments, but we felt that it was simpler with this small amount of imperative programming.¹

5. Remove unused blocks (`remove_unused.ml`)

The "remove unused blocks" pass takes in a program in the "Cif" intermediate language and also returns a program in this language. Its job is to remove any blocks that can never be reached.

Where do unused blocks come from?

Here is an example of how unused blocks can arise.

Let's say that the test operand of an if expression is #t, and the tail (the "else" case) is a Goto. Although the compilation of the if in explicate_pred will discard the "else" tail, the block corresponding to the tail will have already been compiled, and it will become part of the output of the "explicate control" pass. If no other code jumps to that block, it's an unused block.

There is no point in having unused blocks in the code, so this pass removes them.²

The basic algorithm is as follows:

Start with a list of (label, tail) pairs. (This is the way programs are represented in the "Cif" language.)
Convert it to a label to tail map (a tail LabelMap.t in OCaml).
Compute the set of all labels that are reachable from the "start" label. You probably will want to write one or more helper functions to do this. The get_jump_labels function in the Cif module will be useful.
Compute a list of (label, tail) pairs which only includes the reachable labels; this is used to generate the output program.

To figure out which labels are reachable from the "start" label, use get_jump_labels to find the successors of "start", then find the successors of each of these, etc. until every reachable label has been found.

This pass is short and should be fairly straightforward to implement.

Label order

In the label-to-tail list output from the pass, make sure that the block labels are in alphabetical order, since that is the order in the reference files. If you convert a label map to a list using the bindings function, this will happen automatically. This label order will be used in all subsequent passes.

(In "explicate control", the "start" block label is output first, which is just for convenience. That code is supplied to you. In the rest of the passes, the "start" block label is output in its alphabetical order with respect to the rest of the block labels. The order doesn't matter except to make testing easier.)

6. Select instructions (`select_instructions.ml`)

The "select instructions" pass converts programs from the "Cif" intermediate language to the "X86_var_if" intermediate language. Overall, it is structured in the same way as it is in the Var compiler, but with a number of additions to deal with the new forms of the Cond language.

The new atom case for booleans is converted to one of the integers 0 (false) or 1 (true). These integers will be stored in a one-byte segment of a register. (We never put more than one boolean in a register.)

There are a number of new assembly language instructions that are generated:

xorq <arg1>, <arg2> (XOR: used for logical negation)
cmpq <arg1>, <arg2> (used for comparisons)
setCC <arg> (sets a byte based on the contents of the %rflags register)
movzbq <arg1>, <arg2> (move from a one-byte register to a full register)
jCC <label> (conditionally jump to a label)

This needs a bit more explanation.

To negate a boolean variable (which is represented as 0 for false or 1 for true), compute xorq $1, <var> (where $1 is just the number 1 in assembly syntax i.e. an "immediate" value).

The cmpq instruction takes two arguments, compares them, and uses the comparison to set a special "flags" register called %rflags.³ It takes its arguments in backwards order, so to test if x < y, do cmpq y, x. The %rflags register can't be read directly, but the setCC family of instructions can set a byte based on its value. Specifically, the CC (condition code) in setCC is one of:

e (for "equal")
l (for "less than")
le (for "less than or equal")
g (for "greater than")
ge (for "greater than or equal")

So if the instruction cmpq y, x has been done, and x was less than y, the subsequent instruction setl <var> would set <var> to 1, since setl means "set <var> to 1 if the %rflags register indicates that the last comparison yielded a less-than result". As if this wasn't weird enough, the <var> argument of setCC instructions can only be a byte register (a one-byte segment of a full register). We only use the al byte register, which is part of the %rax register, though in principle many more byte registers could be used. (In the code, byte registers are a new constructor of the arg type called ByteReg.)

Since the setCC instructions can only set byte registers, but we normally work with full registers, we need to be able to "move" a byte register into a full register, which is what the movzbq instruction does. Specifically, we will move from the al byte register (which is the only such register we use) to any full register we want.

The jCC family of instructions represents conditional jumps; the possible CC values are the same as for the setCC instructions. To give an example, the jle <label> instruction jumps to the label <label> if the value of the %rflags register indicates that the last cmpq instruction yielded a less-than-or-equal result.

Note

The jCC family of instructions are represented by the JmpIf instruction in the intermediate languages.

The textbook (section 4.9) describes the translations between Cif instructions and X86_var_if instructions.

7. Uncover live (`uncover_live.ml`)

This pass has very significant changes, because the Cond language can yield code with multiple blocks (not counting the prelude and conclusion), and we have to know how to do the liveness analysis with these blocks.

In the previous compiler, we only had one block to deal with, and we computed the liveness sets for each instruction starting from the end of the block and working back towards the beginning. The liveness set at the end was set to be the registers %rax and %rsp, which are used in the conclusion block.

For this compiler, we use a similar strategy per block, but since there are multiple blocks, we have to know in which order to handle them. In addition, we have to know what the liveness set is at the end of each block so that we can use that to work backwards towards the beginning.

Because we only have conditional expressions and not loops, and because conditional expressions can only jump forwards in the code, the directed graph formed by the blocks (the graph vertices or nodes) and the jumps between blocks (the graph edges) is a directed acyclic graph or "DAG". Lecture 12 has some pictures of DAGs formed from code blocks, which we won't repeat here. The endpoint of the DAG is the conclusion block, since this block has only incoming and no outgoing edges.

Note

In general, DAGs can have more than one "leaf" node (node with no outgoing edges) but in our case, the conclusion block is the only one. Control flow will always flow into the conclusion block eventually, as the code progresses towards the end of the main function.

It stands to reason that, since we already know what the liveness set at the beginning of the conclusion block is (just the registers %rax and %rsp), we should start liveness analysis in a block that ends in a jump to that block. But what about all the other blocks?

We need to do two things:

Compute the order of the block labels so that we know in what order to compute the liveness sets.
After computing the liveness sets for a block, record the liveness set at the beginning of the block in a data structure; this will become the final liveness set of any block that jumps to this block.

The way we deal with the first task is to compute a directed graph of all the block labels by:

collecting the jump labels from each block,
using them to compute the graph edges (label to label pairs),
and constructing the graph.

This graph is (potentially) a "multigraph" (meaning that there can be multiple edges between the same blocks), although if compilation has proceeded correctly, there should only be at most one edge between any two blocks.⁴ This graph will be a DAG. If you have a list of (block label to block label) edges, you can construct the graph using the LabelMgraph.of_list function.

Note

Before constructing the graph, filter out the edges that contain the conclusion label, since we won't be doing liveness analysis on the conclusion block!

Once we have the DAG, getting the right order of blocks (actually block labels) is simple. The DAG will point "forward" (towards the conclusion block), but we want to compute liveness "backwards" (starting from the blocks which project directly to the conclusion block). To do this, we "flip the arrows" on the DAG, which is known as "transposing" the graph. This is not hard to do manually, but the LabelMgraph module has a transpose function that will do it for you. After that, we need to order the labels so that no label appears before all the labels that have edges to it have already appeared. This is known as a "topological sort", and (fortunately!) there is a function called topological_sort in the LabelMgraph module. So once you have the DAG, just transpose it and topologically sort it, and you have the correct order of block labels!

Note

Topological sorts are in general not unique, but if you use the library function the order of labels you generate will be the same as the label order in the instructor's compiler, which will make testing easier.

Once we have the order of the labels, we create a map between labels and their live-before sets called live_before_labels. We initialize this with the live-before set of the conclusion block. Then we process each block in the order of their labels, and when we compute the live-before set for each block, we add it to the live_before_labels map. That way, each block will have the live-before set of all blocks that the block can jump to precomputed when the block's liveness sets are computed.

Note

We don't use imperative programming to add to the live_before_labels map, since it's used locally and standard functional programming methods (specifically a fold_left) are sufficient here.

Computing liveness sets for individual blocks is done the same way as in the previous compiler, but with additional cases for the new instructions. Most of the changes are obvious, but some are not.

movzbq instructions act like movq instructions in terms of liveness calculations.
The cmpq instruction reads from both arguments and writes to the %rflags register.
The setCC family reads from the %rflags register and writes to its argument.

We keep track of the %rflags register in addition to the %rax and %rsp registers in liveness analysis from now on.

jmp and jmpIf instructions require special treatment. They only occur at the end of blocks, and there are only two cases we will encounter.

The block ends in a jmpIf instruction followed by a jmp instruction.
The block ends in a jmp instruction without a jmpIf preceding it.

The second case is easy; we look up the jump target label in the live_before_labels map, and the live-before set we find is also the live-before set of the jmp instruction.

The first case is more subtle. For the jmp instruction which follows the jmpIf instruction, we again just make the live-before set of the jump target the live-before set of the instruction. For the jmpIf instruction, we have two jump targets: the one from the following jmp instruction and the one from the label in the jmpIf instruction itself. We need to compute the union of the live-before sets from both jump targets, since we have no way of knowing which block the instruction will jump to. So we union the live-before set of the jmpIf instruction's target with the live-before set from the jmp instruction to compute the live-before set of the jmpIf instruction. Then we compute the liveness set of each instruction in the block as usual.

That's all for this pass! Fortunately, the rest of the passes are much simpler.

8. Build interference graph (`build_interference.ml`)

The main changes needed here are to accommodate the new assembly instructions. The movzbq instruction should be handled like a movq instruction. Byte registers need to be converted to their corresponding full registers; there is a function reg_of_bytereg in the Types module that may be useful. You should consider the destination of the cmpq instruction to be the %rflags register.

Note

This could possiblly have been left out, since %rflags isn't a register used for register allocation, but we put it in for completeness. It doesn't cause problems, since we always read from %rflags immediately after writing to it.

Everything else is pretty much as you'd expect.

9. Graph coloring (`graph_coloring.ml`)

The graph coloring code is unchanged from the previous assignment.⁵

10. Allocate registers (`allocate_registers.ml`)

We've added a couple of features to the template code for the "allocate registers" pass. The register color list includes the %rflags register, with color -6. Aside from this, the changes you need to make are in the convert_instr function to handle the new assembly instructions. The changes are very straightforward, but you should check that the first argument of movzbq and the argument of setCC instructions are byte registers.

11. Remove jumps (`remove_jumps.ml`)

This is an optimization pass which is an optional pass in the textbook. It's fairly simple to implement, so we're making it mandatory. The idea is that if you have a block which is jumped to from only one other block, you can merge the two blocks and eliminate the jump entirely. This case can happen if you have removed unused blocks in a previous pass.

The pseudocode for the algorithm is given in the file remove_jumps.ml, as well as stubs for the functions we used. (You can replace them with your functions if you like, as long as you don't change the module interface.) Note that we define a get_jump_labels function which has the same name as a function in the Cif module, but is different because it takes an X86_var_if block as its argument, whereas the other took a Cif tail.

This code also uses directed graphs (not multigraphs), which are implemented in the support/dgraph.ml[i] files. (We could have used multigraphs, but the actual graphs only have at most one edge between blocks.) We start the algorithm by converting the (label, block) list into a directed graph.

Note

When building the label graph, do not include the conclusion label, because we don't allow merging with the conclusion block.

In the merge_blocks function, we used two utility functions from the Utils module: last and butlast to get the last element in a list, and all but the last element, respectively.

The algorithm consists of finding two labels which correspond to blocks that can be merged, and merging them using the merge_blocks function and the merge_vertices function of the LabelDgraph module. (The latter is needed because when you merge two blocks, the label of the second block goes away.) We find mergeable labels using the get_mergeable_labels function, which returns a pair of mergeable labels if there is one. We use the neighbors_in and neighbors_out functions of the LabelDgraph module to find the candidate pairs. Essentially, we collect all labels with only one out neighbor and look in the target of those labels for one that has only one in neighbor. In that case, the corresponding blocks can be merged.

Note

It might be more efficient to find all pairs of mergeable labels at the start, but since merging two labels changes the graph, it's safer to do it one pair at a time. We're not trying for ultimate efficiency here.

The algorithm is repeated until there are no more mergeable blocks.

12. Patch instructions (`patch_instructions.ml`)

There are a few small additions that need to be made to the "patch instructions" pass to handle the new assembly instructions.

`setCC`

The argument of any of the setCC family of instructions (setl, setle, sete, setg, setge) must be a byte register, or it's an error.

`xorq`

As with other operators, only one of the two operands of xorq can be a stack location (i.e. a memory reference). Therefore, if you encounter that, you should patch the instruction as is done for other instructions.

`movzbq`

The first argument to movzbq should be a byte register.⁶ It should already be a byte register, so if it isn't, that's an error.

The second argument must be a regular register. If it isn't (e.g. if it's a stack location), replace the second argument with %rax and then add a movq instruction to put the contents of %rax into the second argument.

`cmpq`

As with other operators, only one of the two operands of cmpq can be a stack location. Again, if you encounter that, you should patch the instruction as is done for other instructions.

In addition, the second argument of cmpq can't be an immediate value, so if it is, move it to %rax and use %rax instead.

13. Prelude and conclusion (`prelude_conclusion.ml`)

The only changes to this pass from the previous compiler are the obvious changes to handle the new assembly instructions and byte registers.

"Submitting" your assignment

The assignment will be graded in your Github repository as usual, in the ch4 directory.

Of course, one nice thing about OCaml is that you can choose to use imperative programming whenever it suits you, and avoid it when it's unnecessary. ↩
This is one example of what is called dead code elimination in the compiler literature, which means removing code which cannot have any effect on the final result of a program. Dead code comes in many forms besides this. ↩
The textbook calls this EFLAGS, which is an older name that is appropriate for 32-bit x86 systems only. ↩
The textbook insists on using multigraphs, but in fact, regular directed graphs could have been used instead. Nevertheless, this doesn't change the algorithm. ↩
This is the benefit of programming an algorithm as a functor that works on abstract values! Go OCaml! ↩
Technically, it could also be an immediate value, but we won't be using it that way. ↩

Assignment 3: Compiler passes

1. Shrink (shrink.ml)

2. Uniquify (uniquify.ml)

3. Remove complex operands (remove_complex.ml)

4. Explicate control (explicate_control.ml)

5. Remove unused blocks (remove_unused.ml)

6. Select instructions (select_instructions.ml)

7. Uncover live (uncover_live.ml)

8. Build interference graph (build_interference.ml)

9. Graph coloring (graph_coloring.ml)

10. Allocate registers (allocate_registers.ml)

11. Remove jumps (remove_jumps.ml)

12. Patch instructions (patch_instructions.ml)

setCC

xorq

movzbq

cmpq

13. Prelude and conclusion (prelude_conclusion.ml)

"Submitting" your assignment

1. Shrink (`shrink.ml`)

2. Uniquify (`uniquify.ml`)

3. Remove complex operands (`remove_complex.ml`)

4. Explicate control (`explicate_control.ml`)

5. Remove unused blocks (`remove_unused.ml`)

6. Select instructions (`select_instructions.ml`)

7. Uncover live (`uncover_live.ml`)

8. Build interference graph (`build_interference.ml`)

9. Graph coloring (`graph_coloring.ml`)

10. Allocate registers (`allocate_registers.ml`)

11. Remove jumps (`remove_jumps.ml`)

12. Patch instructions (`patch_instructions.ml`)

`setCC`

`xorq`

`movzbq`

`cmpq`

13. Prelude and conclusion (`prelude_conclusion.ml`)