Assignment 5: Compiler passes

The compiler passes are described in chapter 6 of the textbook and in the lectures, but here they are again for completeness. (You should still read the book and lectures for a more in-depth explanation!) We will only include passes that you have to implement. As before, the only files you should modify are the .ml files corresponding to these compiler passes.

A. Passes that are unchanged from the previous assignment

These passes are identical except possibly for the names of imported modules (e.g. Cloop changing to Ctup).

remove unused blocks
graph coloring
remove jumps

B. Passes that have minimal modifications from the previous assignment

These passes have to be extended to handle the new forms, but the extensions should be straightforward.

shrink
uniquify
uncover get!

C. Passes that are new

Expose allocation (`expose_allocation.ml`)

This pass is discussed in lecture 16 and in the textbook. The input language is lfun_shrink and the output language is lalloc.

The goal of this pass is to replace the vector form (the vector constructor) with three lower-level forms:

collect which invokes the garbage collector,
allocate which allocates memory on the heap,
global_value which accesses global variables which are used by the garbage collector.

The transformation is done in the function expose_allocation_exp. The function recurses through all subexpressions of the program expression, converting vector forms as they are encountered. That's all it does.

The transformation

Here is the transformation, in pseudocode:

(vector (e_0 ... e_{n-1}) (type <type>))

-->

(let (x_0 e_0')
...
(let (x_{n–1} e_{n–1}')
  (let (_
         (if (< (+ (global-value free_ptr) <nbytes>)
                   (global-value fromspace_end))
             (void)
             (collect <nbytes>)))
     (let (v (allocate <len> <type>))
       (let (_ (vector-set! v 0 x_0))
            ;; or (vector-set! v 0 e_0) if e_0 is atomic
       ...
       (let (_ (vector-set! v (- n 1) x_{n–1}))
          v) )))))

where:

<type>: the type of the vector
<len>: the length of the vector
<nbytes>: the number of bytes needed for a vector of length `len`
x_0 ... x_{n-1}: fresh names
e_0' ... e_{n-1}': (recursively) transformed versions
  of non-atomic e_0 ... e{n-1}

Note

In the OCaml code, global_value is GlobalVal, collect is Collect, and allocate is Allocate.

This transformation is quite complicated, so we're going to discuss it in some detail.

The names x_0 to x_{n-1} in the initial let expression are fresh names. The expressions e_0' to e_{n-1} are the arguments of the vector form, recursively transformed by the expose_allocation_exp function. You only need to let-bind a new fresh name for non-atomic subexpressions of the vector form, so for a vector of length n you generally will not have n nested let expressions, since some of the subexpressions will usually be atomic (variables or literals). We've supplied a function in the template code called new_var to generate fresh names for use here.

In contrast, in the nested let expressions at the end with the vector-set! forms, there do need to be n such expressions, since these initialize the vector with its initial values for each of the n slots. If the original expression for a slot was complex, use the variable bound to (the transformed version of) that expression (one of the x_is); if it was atomic, just use that expression directly. For these let expressions, bind a dummy name to the vector-set! expressions; these names will never be used. (We call them _ here, but you should generate an actual fresh name. We used a function called new_void_var to do this, which we've supplied in the template code.)

Note

You might wonder why we are using nested let expressions at the end instead of a begin expression. Both can work, but the nested lets, somewhat paradoxically, generate simpler code.

The number of bytes needed for a vector of length n is just 8 * (n + 1) i.e. 8 bytes per slot plus 8 bytes for the vector tag. (Note that we aren't including the size of the contents of the vector in this calculation, just the size of the vector itself, since that is what we will need to allocate heap memory for.)

Coding notes

In our code, we defined a simple helper function called is_atom which returns true if an expression is atomic. We used this to separate out atomic expressions (which don't need to be let-bound to fresh names) from non-atomic expressions (which do). One useful trick is to convert (map) the expressions into polymorphic variants which are either

`Atom exp for atomic expressions
`VarExp (fresh_name, exp) for non-atomic expressions

and then use the resulting list of expressions to generate the nested let expressions at the beginning of the form.

Note

This kind of "temporary label" is a very common use case for polymorphic variants in real-world OCaml code.

Sequence of the transformation

We want your code to match the reference code exactly. The challenge is that this is an imperative pass, like "uniquify", because you are generating new names. It's not hard to match the reference code for a simple pass like "uniquify", but it's considerably harder here. To help you with this, here is a description of the order in which we implemented the transformation of expressions of the form Vec (es, t):

Recurse on all the subexpressions es.
Generate all the fresh variable names for non-atomic expressions in es.
Generate a fresh name for the result vector.
Generate the if expression.
Generate all the vector-set! expressions.
Generate all the nested (let (_ ...) (vector-set! ...)) expressions which wrap the vector-set! expressions.
Generate the outermost (let (_ ...) ...) expression.
Put everything together.

D. Passes with significant modifications from the previous assignment

1. Remove complex operands (`remove_complex.ml`)

In this pass, the input language is lalloc_get and the output language is lalloc_mon. The Collect, Allocate, and GlobalVal forms are all considered to be complex operands. Other than that, the code is similar to the code from the previous assignment. In rco_atom, bind Collect to the dummy name $_, since it returns Void. That's all there is for this pass!

2. Explicate control (`explicate_control.ml`)

In this pass, the input language is lalloc_mon and the output language is ctup.

Note that Collect forms were bound to dummy variables of the form $_ in the previous pass; therefore, Collect forms can be handled in explicate_assign as well as in explicate_effect. It should never be in tail position, for obvious reasons. The Allocate and GlobalVal forms are only handled in the convert_exp function. Neither form is ever done for effects alone.

The VecRef case in explicate_pred is handled by implementing this conversion:

(if (vector-ref v i) ...)

; -->

(let (vecref_N (vector-ref v i))  ; `vecref_N` is a new name
  (if vecref_N ...))

3. Select instructions (`select_instructions.ml`)

In this pass, the input language is ctup and the output language is x86_var_global.

Compared to the previous assignment's x86_var_loop language, the x86_var_global language has these additions:

arg type: a new GlobalArg constructor (called simply Global in the textbook)
instr type: new instructions for Andq (bitwise AND) and Sarq (arithmetic shift right)
info3 type: a new field num_spilled_root which contains the number of root stack slots needed to store vector references

A number of new translations of expressions and statements need to be implemented, which we describe next. Please also refer to the textbook (section 6.6).

`Collect`

Collect is a statement in ctup, not an expression. It has one argument, which is the number of bytes needed. It compiles to code to call the collect function of the garbage collector. This function (written in C in runtime.c) takes two arguments: a pointer to the end of the "root stack" (the extra stack for storing vector pointers) and the number of bytes needed. The first (the "root stack pointer") is always stored in register %r15; the second is the contents of the Collect constructor.

Recall that function arguments are placed in the registers %rdi, %rsi, %rdx, %rcx, %r8, %r9in order; any additional arguments are passed on the stack. For the collect function, we only need %rdi and %rsi. Use movq instructions to move the arguments to the required registers. We compile to a callq instruction (called Callq in x86_var_global), which takes a label and an arity (argument count; here, 2). The label is just collect (the name of the function), but you must call fix_label on it, since it's an externally visible label.

`Allocate`

Allocate is where vectors actually get allocated. In ctup, it's always an expression, not a statement. There are two forms of this instruction. One form allocates a vector and assigns the vector to a variable. The other returns an allocated vector. The second form is equivalent to assigning the vector to the %rax register, and is used when allocating a vector in tail position. We'll only describe the first form, since the second form is identical other than assigning to %rax instead of assigning to a variable.

Note

The translations of GlobalVal, VecLen, VecSet, and VecRef also have two different forms that are handled in a similar way. One form assigns the expression to a variable, while the other is an expression in tail position, where the result is assigned to the %rax register.

The translation is (in effect):

<lhs> = (allocate <len> (Vector <type> ...));
-->
movq free_ptr(%rip), %r11
addq 8(<len> + 1), free_ptr(%rip)
movq $<tag>, 0(%r11)
movq %r11, <lhs'>

where:

<lhs>: the `ctup` variable being assigned to
<lhs'>: the `x86_var_global` variable being assigned to
        (the `arg` equivalent of the `ctup` `atm` form)
<len>: the length of the vector being allocated
(Vector <type> ...): the type of the vector
<tag>: the vector tag (AKA tuple tag)

To do this translation, we have to compute the vector tag corresponding to the vector type (given as (Vector <type> ...)). We recommend that you define a helper function called make_tuple_tag with this signature:

let make_tuple_tag (len : int) (vec : ty array) : int = ...

This function should:

check that the length len is between 0 and 50, and raise an exception otherwise
create a string representation of the binary number corresponding to the tag (OCaml literal binary numbers start with 0b)
convert the string to an integer using the function int_of_string; this is the tuple tag

The binary number itself contains (in order from left to right):

some unused bits, which should all be 0s
0/1 values corresponding to (up to) 50 slots in the vector; if the type of that slot is itself a vector, use a 1, else use a 0
6 bits corresponding to the length of the vector
1 bit which is always 1 (the forwarding bit)

Note that the rightmost bits of the pointer bits correspond to the lowest indexed slots of the vector, so the absolute rightmost pointer bit corresponds to slot 0. Unused slots are 0s; you don't have to have them in the bit string since trailing 0s don't change an integer's value. See figure 6.8 in the textbook for a pictorial representation of tuple tags.

One odd thing about the translation given above are the free_ptr(%rip) parts. What we really want is the address of free_ptr so we can dereference it, store into it, etc. But in x86-64 assembly language, we can't directly refer to the address of a global variable. Instead, we have to refer to it as an offset from the contents of the instruction pointer register %rip. As the textbook explains, when the assembler sees free_ptr(%rip), it computes the difference d between the instruction pointer location and the address of the global variable free_ptr, and converts the instruction to d(%rip), which accesses the memory d bytes away from the current location of the instruction pointer. This gives you the address of the free_ptr global variable, albeit in a somewhat indirect way. This addressing mode is called instruction-pointer-relative addressing.

Don't forget to use fix_label on the free_ptr label, since it's the label of a globally visible value (in this case, a global variable).

`GlobalVal`

GlobalVal (in ctup) is converted to GlobalArg of a label. Again, don't forget to use fix_label on the label since the label is of a global variable. When used in an assignment, use movq to move the global variable contents to the destination variable; when used as an expression in a tail context, move it to the %rax register.

`VecRef`, `VecSet` and `VecLen`

These are described well in the textbook (section 6.6). Note that VecLen needs to extract the length field from the vector tag; the andq and sarq instructions are used here to accomplish that. For all three forms, put the vector pointer into the reserved register %r11. The %r11 register thus serves as temporary scratch space for vector pointers.

Note

If your compare tests are failing on the place where you use andq and sarq instructions, you may need to reverse-engineer our algorithm for computing vector lengths from the tuple tag. Hint: we used one andq instruction followed by one sarq instruction.

Note that VecSet (the ctup expression) also exists as VecSetS (the ctup statement). VecSet always returns Void, which is just the immediate integer value 0 in assembly language. VecSetS doesn't return anything, but the translation is otherwise the same.

4. Uncover live (`uncover_live.ml`)

There are only two modifications you need to make to the "uncover live" pass.

The GlobalArg constructor has to be handled.
The new instructions andq and sarq have to be handled.

The GlobalArg constructor doesn't actually do anything in this pass since its argument is a label. It should be handled where the other arg values are handled.

The andq instruction is handled like addq and subq.

The sarq has two arguments (source and destination). The way we use it, the source argument is always an immediate value, so you can ignore it in liveness analysis. The sarq instruction reads from and writes to the destination argument.

5. Build interference (`build_interference.ml`)

This pass requires three changes/additions:

The main change in this pass is that the make_graph function has to deal with vector-valued variables, which are handled specially. This only affects the handling of callq instructions when calling the collect function ("collect" label). In addition to what was done previously, you need to add an edge between all callee-save registers and all locations in the after set that correspond to call-live vector-valued variables.

Note

Edges between caller-save registers and vector-valued variables will already be set up from the code from previous assignments. Given that we are not allowing pointers to call-live vector-valued variables to be stored in any registers or in the regular stack, the only place they can be stored in is in the root stack, which we will deal with in the next section.
The andq and sarq instructions need to be processed. This can be done much like the other instructions.
The destination of an instruction can now be a GlobalArg value. These values do not affect the interference graph, so they can be ignored (just return the graph unchanged).

6. Allocate registers (`allocate_registers.ml`)

This pass has a number of changes, some of them supplied to you in the template code.

Spilled stack locations on the regular stack are handled differently than previously. In addition, we need to manage the root stack, which is where vector-valued variables go. We have two counters for the two stacks, called next_stack_index and next_root_stack_index. These are both integer references. Before allocating registers, both of these counters are initialized to 1.

Note

The reason for the 1-indexing is that two registers are used for the two stacks: %rbp for the regular stack and %r15 for the root stack.

In the regular stack, the caller's %rbp value is stored at 0(%rbp), so the first available slot is at -8(%rbp) (the regular stack grows downwards).

In the root stack, the %r15 register contains the top of the root stack, and the root stack grows upwards. Despite this, we start assigning slots from -8(%r15), since we don't keep a pointer to the base of the root stack in a register. (In a sense, we are assigning slots in the "wrong order", but it doesn't cause problems.)
We have a new hashtable called spilled_location_dict which is cleared before register allocation. In it, we map integers ("colors" assigned in graph coloring) to spilled locations (regular stack and root stack).
The location_of_color function is the main thing that needs to be changed for this pass.
- As before, if the color (represented as an integer) is in the color register map, return the register that the color maps to. Otherwise, the color represents a spilled variable.
- This function can spill variables to either the regular stack or to the root stack. The vector_typed argument is used to select the stack; if it's true, spill to the root stack.
- Check to see if the color is in the spilled_location_dict; if it is, return that location. Otherwise, continue.
- Note that the colors of variables that represent vectors can be interleaved with the colors of variables that don't. Therefore, you can't just take the difference between the color and the last register color to determine where on the stack to spill the variable to. Instead, use the next_stack_index variable (for non-vector-typed variables) and next_root_stack_index variable (for vector-typed variables) to determine the stack slot for the respective slots. (Increment the index you pick after you use it, so the next variable will get a different slot.) The offset is (-8) times the index, and the base register is Rbp for the regular stack or R15 for the root stack. This gives you the location you need.
- Once you've done this, add the color/location pair to the spilled_location_dict in case the location of the same color is requested later.
The varmap_of_colormap function has to be modified to call location_of_color with the new vector_typed argument. It does this by seeing if a variable is in the vvs (vector-valued variables) set.
The get_variable_location_map function (supplied to you) now re-initializes the stack index variables and the spilled location dictionary.
The convert_instr function needs to be extended to handle the andq and sarq instructions.
The get_num_spilled function is now supplied to you. This function is trivial because we can read off the number of spilled variables on both the regular and root stacks from the next_stack_index and next_root_stack_index variables.
The new vector_vars function (supplied to you) returns a set of all the vector-valued variables.
The allocate_registers function now computes both the number of variables spilled to the regular stack and the number of variables spilled to the root stack. This information is stored in the info3 datatype.

7. Patch instructions (`patch_instructions.ml`)

Only a couple of modifications need to be made for this pass:

When patching instructions, many instructions can't have both arguments as stack locations. This also applies to global variable locations (GlobalArg arguments), as well as any combination of those and stack locations.
The new andq and sarq instructions also need to be patched.

8. Prelude and conclusion (`prelude_conclusion.ml`)

There are a few additions needed in this pass.

When converting global labels (GlobalArg arguments), use instruction-pointer-relative addressing (discussed above), so GlobalArg lbl becomes GlobalArg (Rip, lbl).
Handle the new andq and sarq instructions.
In the prelude, do the following extra tasks:
1. The garbage collector needs to be initialized by a call to the initialize function (don't forget to use fix_label on the function labels!). This function takes two arguments: the root stack size and the heap size. Set them both to be the value of the init_heap_size variable. Recall that the first two arguments to a function are passed in registers %rdi and %rsi, after which a callq to the function label should be added.
2. In the prelude, the address of the rootstack_begin global variable (again, use fix_label for global variable labels) should be copied to register %r15. Use instruction-pointer-relative addressing.
3. In the prelude, all the root stack slots should be initialized to 0. This should use the %r15 register, which currently points to the beginning of the root stack.
4. The address of the end of the root stack should now be copied into the %r15 register.
In the conclusion, reset the %r15 register to the beginning of the root stack space. Do this by subtracting the root stack space used from %r15.

Figure 6.16 in the textbook gives you an idea of how this should look for a program which needs a single root stack slot and no regular stack slots.

Assignment 5: Compiler passes

A. Passes that are unchanged from the previous assignment

B. Passes that have minimal modifications from the previous assignment

C. Passes that are new

Expose allocation (expose_allocation.ml)

The transformation

Coding notes

Sequence of the transformation

D. Passes with significant modifications from the previous assignment

1. Remove complex operands (remove_complex.ml)

2. Explicate control (explicate_control.ml)

3. Select instructions (select_instructions.ml)

Collect

Allocate

GlobalVal

VecRef, VecSet and VecLen

4. Uncover live (uncover_live.ml)

5. Build interference (build_interference.ml)

6. Allocate registers (allocate_registers.ml)

7. Patch instructions (patch_instructions.ml)

8. Prelude and conclusion (prelude_conclusion.ml)

Expose allocation (`expose_allocation.ml`)

1. Remove complex operands (`remove_complex.ml`)

2. Explicate control (`explicate_control.ml`)

3. Select instructions (`select_instructions.ml`)

`Collect`

`Allocate`

`GlobalVal`

`VecRef`, `VecSet` and `VecLen`

4. Uncover live (`uncover_live.ml`)

5. Build interference (`build_interference.ml`)

6. Allocate registers (`allocate_registers.ml`)

7. Patch instructions (`patch_instructions.ml`)

8. Prelude and conclusion (`prelude_conclusion.ml`)