Assignment 5: Compiler passes
The compiler passes are described in chapter 6 of the textbook
and in the lectures,
but here they are again for completeness.
(You should still read the book and lectures for a more in-depth explanation!)
We will only include passes that you have to implement.
As before, the only files you should modify are the .ml files
corresponding to these compiler passes.
A. Passes that are unchanged from the previous assignment
These passes are identical except possibly for the names of imported
modules (e.g. Cloop changing to Ctup).
- remove unused blocks
- graph coloring
- remove jumps
B. Passes that have minimal modifications from the previous assignment
These passes have to be extended to handle the new forms, but the extensions should be straightforward.
- shrink
- uniquify
- uncover get!
C. Passes that are new
Expose allocation (expose_allocation.ml)
This pass is discussed in lecture 16 and in the textbook.
The input language is lfun_shrink
and the output language is lalloc.
The goal of this pass is to replace the vector form
(the vector constructor) with three lower-level forms:
collectwhich invokes the garbage collector,allocatewhich allocates memory on the heap,global_valuewhich accesses global variables which are used by the garbage collector.
The transformation is done in the function expose_allocation_exp.
The function recurses through all subexpressions of the program expression,
converting vector forms as they are encountered.
That's all it does.
The transformation
Here is the transformation, in pseudocode:
(vector (e_0 ... e_{n-1}) (type <type>))
-->
(let (x_0 e_0')
...
(let (x_{n–1} e_{n–1}')
(let (_
(if (< (+ (global-value free_ptr) <nbytes>)
(global-value fromspace_end))
(void)
(collect <nbytes>)))
(let (v (allocate <len> <type>))
(let (_ (vector-set! v 0 x_0))
;; or (vector-set! v 0 e_0) if e_0 is atomic
...
(let (_ (vector-set! v (- n 1) x_{n–1}))
v) )))))
where:
<type>: the type of the vector
<len>: the length of the vector
<nbytes>: the number of bytes needed for a vector of length `len`
x_0 ... x_{n-1}: fresh names
e_0' ... e_{n-1}': (recursively) transformed versions
of non-atomic e_0 ... e{n-1}
Note
In the OCaml code, global_value is GlobalVal,
collect is Collect, and allocate is Allocate.
This transformation is quite complicated, so we're going to discuss it in some detail.
The names x_0 to x_{n-1} in the initial let expression are fresh names.
The expressions e_0' to e_{n-1} are the arguments of the vector form,
recursively transformed by the expose_allocation_exp function.
You only need to let-bind a new fresh name
for non-atomic subexpressions of the vector form,
so for a vector of length n
you generally will not have n nested let expressions,
since some of the subexpressions will usually be atomic
(variables or literals).
We've supplied a function in the template code called new_var
to generate fresh names for use here.
In contrast,
in the nested let expressions at the end with the vector-set! forms,
there do need to be n such expressions,
since these initialize the vector with its initial values
for each of the n slots.
If the original expression for a slot was complex,
use the variable bound to (the transformed version of) that expression
(one of the x_is);
if it was atomic, just use that expression directly.
For these let expressions,
bind a dummy name to the vector-set! expressions;
these names will never be used.
(We call them _ here, but you should generate an actual fresh name.
We used a function called new_void_var to do this,
which we've supplied in the template code.)
Note
You might wonder why we are using nested let expressions
at the end instead of a begin expression.
Both can work, but the nested lets, somewhat paradoxically,
generate simpler code.
The number of bytes needed for a vector of length n is just
8 * (n + 1) i.e. 8 bytes per slot plus 8 bytes for the vector tag.
(Note that we aren't including the size of the contents of the vector
in this calculation, just the size of the vector itself,
since that is what we will need to allocate heap memory for.)
Coding notes
In our code,
we defined a simple helper function called is_atom
which returns true if an expression is atomic.
We used this to separate out atomic expressions
(which don't need to be let-bound to fresh names)
from non-atomic expressions (which do).
One useful trick is to convert (map) the expressions
into polymorphic variants which are either
`Atom expfor atomic expressions`VarExp (fresh_name, exp)for non-atomic expressions
and then use the resulting list of expressions to generate
the nested let expressions at the beginning of the form.
Note
This kind of "temporary label" is a very common use case for polymorphic variants in real-world OCaml code.
Sequence of the transformation
We want your code to match the reference code exactly.
The challenge is that this is an imperative pass, like "uniquify",
because you are generating new names.
It's not hard to match the reference code
for a simple pass like "uniquify",
but it's considerably harder here.
To help you with this,
here is a description of the order
in which we implemented the transformation
of expressions of the form Vec (es, t):
- Recurse on all the subexpressions
es. - Generate all the fresh variable names
for non-atomic expressions in
es. - Generate a fresh name for the result vector.
- Generate the
ifexpression. - Generate all the
vector-set!expressions. - Generate all the nested
(let (_ ...) (vector-set! ...))expressions which wrap thevector-set!expressions. - Generate the outermost
(let (_ ...) ...)expression. - Put everything together.
D. Passes with significant modifications from the previous assignment
1. Remove complex operands (remove_complex.ml)
In this pass, the input language is lalloc_get
and the output language is lalloc_mon.
The Collect, Allocate, and GlobalVal forms
are all considered to be complex operands.
Other than that, the code is similar to
the code from the previous assignment.
In rco_atom, bind Collect to the dummy name $_,
since it returns Void.
That's all there is for this pass!
2. Explicate control (explicate_control.ml)
In this pass, the input language is lalloc_mon
and the output language is ctup.
Note that Collect forms were bound to dummy variables
of the form $_ in the previous pass;
therefore, Collect forms can be handled in
explicate_assign as well as in explicate_effect.
It should never be in tail position, for obvious reasons.
The Allocate and GlobalVal forms
are only handled in the convert_exp function.
Neither form is ever done for effects alone.
The VecRef case in explicate_pred is handled
by implementing this conversion:
(if (vector-ref v i) ...)
; -->
(let ($vecref.N (vector-ref v i)) ; `$vecref.N` is a new name
(if $vecref.N ...))
3. Select instructions (select_instructions.ml)
In this pass, the input language is ctup
and the output language is x86_var_global.
Compared to the previous assignment's x86_var_loop language,
the x86_var_global language has these additions:
-
argtype: a newGlobalArgconstructor (called simplyGlobalin the textbook) -
instrtype: new instructions forAndq(bitwise AND) andSarq(arithmetic shift right) -
info3type: a new fieldnum_spilled_rootwhich contains the number of root stack slots needed to store vector references
A number of new translations of expressions and statements need to be implemented, which we describe next. Please also refer to the textbook (section 6.6).
Collect
Collect is a statement in ctup, not an expression.
It has one argument, which is the number of bytes needed.
It compiles to code to call the collect function of the garbage collector.
This function (written in C in runtime.c)
takes two arguments: a pointer to the end of the "root stack"
(the extra stack for storing vector pointers)
and the number of bytes needed.
The first (the "root stack pointer") is always stored in register %r15;
the second is the contents of the Collect constructor.
Recall that function arguments are placed in the registers
%rdi, %rsi, %rdx, %rcx, %r8, %r9in order.
For the collect function, we only need %rdi and %rsi.
Use movq instructions to move the arguments to the required registers.
We compile to a callq instruction (called Callq in x86_var_global),
which takes a label and an arity (argument count; here, 2).
The label is just collect (the name of the function),
but you must call fix_label on it,
since it's an externally visible label.
Allocate
Allocate is where vectors actually get allocated.
In ctup, it's always an expression, not a statement.
There are two forms of this instruction.
One form allocates a vector and assigns the vector to a variable.
The other returns an allocated vector.
The second form is equivalent to assigning the vector to the %rax register,
and is used when allocating a vector in tail position.
We'll only describe the first form,
since the second form is identical other than assigning to %rax
instead of assigning to a variable.
Note
The translations of GlobalVal, VecLen, VecSet, and VecRef
also have two different forms that are handled in a similar way.
One form assigns the expression to a variable,
while the other is an expression in tail position, where the result
is assigned to the %rax register.
The translation is equivalent to this code:
<lhs> = (allocate <len> (Vector <type> ...));
-->
movq free_ptr(%rip), %r11
addq 8(<len> + 1), free_ptr(%rip)
movq $<tag>, 0(%r11)
movq %r11, <lhs'>
where:
<lhs>: the `ctup` variable being assigned to
<lhs'>: the `x86_var_global` variable being assigned to
(the `arg` equivalent of the `ctup` `atm` form)
<len>: the length of the vector being allocated
(Vector <type> ...): the type of the vector
<tag>: the vector tag (AKA tuple tag)
To do this translation, we have to compute the vector tag
corresponding to the vector type
(given as (Vector <type> ...)).
We recommend that you define a helper function called make_tuple_tag
with this signature:
This function should:
- check that the length
lenis between 0 and 50, and raise an exception otherwise - create a string representation of the binary number
corresponding to the tag
(OCaml literal binary numbers start with
0b) - convert the string to an integer using the function
int_of_string; this is the tuple tag
The binary number itself contains (in order from left to right):
- some unused bits, which should all be 0s
- the pointer bits: 0/1 values corresponding to (up to) 50 slots in the vector; if the type of that slot is itself a vector, use a 1, else use a 0
- the length bits: 6 bits corresponding to the length of the vector
- the forwarding bit: 1 bit which is always 1
Note that the rightmost bits of the pointer bits correspond to the lowest indexed slots of the vector, so the absolute rightmost pointer bit corresponds to slot 0. Unused slots are 0s; you don't have to have them in the bit string since trailing 0s don't change an integer's value. See figure 6.8 in the textbook for a pictorial representation of tuple tags.
One odd thing about the translation given above are the
free_ptr(%rip) parts.
What we really want is the address of free_ptr
so we can dereference it, store into it, etc.
But in x86-64 assembly language,
we can't directly refer to the address of a global variable.
Instead, we have to refer to it as an offset
from the contents of the instruction pointer register %rip.
As the textbook explains,
when the assembler sees free_ptr(%rip),
it computes the difference d between the instruction pointer location
and the address of the global variable free_ptr,
and converts the instruction to d(%rip),
which accesses the memory d bytes away from the current location
of the instruction pointer.
This gives you the address of the free_ptr global variable,
albeit in a somewhat indirect way.
This addressing mode is called instruction-pointer-relative addressing.
Don't worry about this for this pass;
pointer-relative addressing is added in later passes.
In this pass, the address of free_ptr is just GlobalArg (Label free_ptr).
We also need to use fix_label on the free_ptr label,
since it's the label of a globally visible value
(in this case, a global variable);
that would make the address GlobalArg (Label (fix_label free_ptr)).
GlobalVal
GlobalVal (in ctup) is converted to GlobalArg of a label.
Again, don't forget to use fix_label on the label
since it's the label of a global variable.
When used in an assignment, use movq to move the global variable contents
to the destination variable;
when used as an expression in a tail context, move it to the %rax register.
VecRef, VecSet and VecLen
These translations are described well in the textbook (section 6.6).
Note that VecLen needs to extract the length field from the vector tag;
the andq and sarq instructions are used to accomplish that.
For all three forms, put the vector pointer into the reserved register %r11.
The %r11 register thus serves as temporary scratch space
for vector pointers.
Note
If your compare tests are failing on the place where you use
andq and sarq instructions,
you may need to reverse-engineer our algorithm for computing
vector lengths from the tuple tag.
Hint: we used one andq instruction
followed by one sarq instruction.
Note that VecSet (the ctup expression)
also exists as VecSetS (the ctup statement).
VecSet always returns Void,
which is just the immediate integer value 0 in assembly language.
VecSetS doesn't return anything, but the translation is otherwise the same.
4. Uncover live (uncover_live.ml)
There are only two modifications you need to make to the "uncover live" pass.
- The
GlobalArgconstructor has to be handled. - The new instructions
andqandsarqhave to be handled.
The GlobalArg constructor doesn't actually do anything in this pass
since its argument is a label.
It should be handled where the other arg values are handled.
The andq instruction is handled like addq and subq.
The sarq has two arguments (source and destination).
The way we use it, the source argument is always an immediate value,
so you can ignore it in liveness analysis.
The sarq instruction reads from and writes to the destination argument.
5. Build interference (build_interference.ml)
This pass requires three changes/additions:
-
The main change in this pass is that the
make_graphfunction has to deal with vector-valued variables, which are handled specially. This only affects the handling ofcallqinstructions when calling thecollectfunction ("collect"label). In addition to what was done previously, you need to add an edge between all callee-save registers and all locations in the after set that correspond to call-live vector-valued variables.Note
Edges between caller-save registers and vector-valued variables will already be set up from the code from previous assignments. Given that we are not allowing pointers to call-live vector-valued variables to be stored in any registers or in the regular stack, the only place they can be stored in is in the root stack, which we will deal with in the next section.
-
The
andqandsarqinstructions need to be processed. This can be done much like the other instructions. -
The destination of an instruction can now be a
GlobalArgvalue. These values do not affect the interference graph, so they can be ignored (just return the graph unchanged).GlobalArgand the interference graphThe reason that
GlobalArgs do nothing to the interference graph is because they are not associated with a "location" in the sense that matters to the interference graph. The only "locations" the interference graph cares about are variables and registers. GlobalArgs only have labels.
6. Allocate registers (allocate_registers.ml)
This pass has a number of changes, some of them supplied to you in the template code.
-
Spilled stack locations on the regular stack are handled differently than previously. In addition, we need to manage the root stack, which is where call-live vector-valued variables go. We have two counters for the two stacks, called
next_stack_indexandnext_root_stack_index. These are both integer references. Before allocating registers, both of these counters are initialized to 1.Note
The reason for the 1-indexing is that two registers are used for the two stacks:
%rbpfor the regular stack and%r15for the root stack.In the regular stack, the caller's
%rbpvalue is stored at0(%rbp), so the first available slot is at-8(%rbp), since the regular stack grows downwards.In the root stack, the
%r15register contains the top of the root stack, and the root stack grows upwards. Despite this, we start assigning slots from-8(%r15), since we don't keep a pointer to the base of the root stack in a register. (In a sense, we are assigning slots in the "wrong order", but it doesn't cause problems. It's also possible to assign regular stack slots this way, and some compilers do it that way.) -
We have a new hashtable called
spilled_location_dictwhich is cleared before register allocation. In it, we map integers ("colors" assigned in graph coloring) to spilled locations (regular stack and root stack). -
The
location_of_colorfunction is the main thing that needs to be changed for this pass.-
As before, if the color (represented as an integer) is in the color register map, return the register that the color maps to. Otherwise, the color represents a spilled variable.
-
This function can spill variables to either the regular stack or to the root stack. The
vector_typedargument is used to select the stack; if it'strue, spill to the root stack. -
Check to see if the color is in the
spilled_location_dict; if it is, return that location. Otherwise, continue. -
Note that the colors of variables that represent vectors can be interleaved with the colors of variables that don't. Therefore, you can't just take the difference between the color and the last register color to determine where on the stack to spill the variable to. Instead, use the
next_stack_indexvariable (for non-vector-typed variables) andnext_root_stack_indexvariable (for vector-typed variables) to determine the stack slot for the respective slots. (Increment the index you pick after you use it, so the next variable will get a different slot.) The offset is (-8) times the index, and the base register isRbpfor the regular stack orR15for the root stack. This gives you the location you need. -
Once you've done this, add the color/location pair to the
spilled_location_dictin case the location of the same color is requested later.
-
-
The
varmap_of_colormapfunction has to be modified to calllocation_of_colorwith the newvector_typedargument. It does this by seeing if a variable is in thevvs(vector-valued variables) set. -
The
get_variable_location_mapfunction (supplied to you) now re-initializes the stack index variables and the spilled location dictionary. -
The
convert_instrfunction needs to be extended to handle theandqandsarqinstructions. -
The
get_num_spilledfunction is now supplied to you. This function is trivial because we can read off the number of spilled variables on both the regular and root stacks from thenext_stack_indexandnext_root_stack_indexvariables. -
The new
vector_varsfunction (supplied to you) returns a set of all the vector-valued variables. -
The
allocate_registersfunction now computes both the number of variables spilled to the regular stack and the number of variables spilled to the root stack. This information is stored in theinfo3datatype.
7. Patch instructions (patch_instructions.ml)
Only a couple of modifications need to be made for this pass:
-
When patching instructions, many instructions can't have both arguments as stack locations. This also applies to global variable locations (
GlobalArgarguments), as well as any combination of those and stack locations. -
The new
andqandsarqinstructions also need to be patched.
8. Prelude and conclusion (prelude_conclusion.ml)
There are a few additions needed in this pass.
-
When converting global labels (
GlobalArgarguments), use instruction-pointer-relative addressing (discussed above), soGlobalArg lblbecomesGlobalArg (Rip, lbl). -
Handle the new
andqandsarqinstructions. -
In the prelude, do the following extra tasks:
-
The garbage collector needs to be initialized by a call to the
initializefunction (don't forget to usefix_labelon the function label!). This function takes two arguments: the root stack size and the heap size. Set them both to be the value of theinit_heap_sizevariable. Recall that the first two arguments to a function are passed in registers%rdiand%rsi, after which acallqto the function label should be added. -
In the prelude, the address of the
rootstack_beginglobal variable (again, usefix_labelfor global variable labels) should be copied to register%r15. Use instruction-pointer-relative addressing. -
In the prelude, all the root stack slots should be initialized to 0. This should use the
%r15register, which at this time points to the beginning of the root stack. -
The address of the end of the root stack should now be copied into the
%r15register.
-
-
In the conclusion, reset the
%r15register to the beginning of the root stack space. Do this by subtracting the root stack space used from%r15.
Figure 6.16 in the textbook gives you an idea of how this should look for a program which needs a single root stack slot and no regular stack slots.