Assignment 5: Compiler passes
The compiler passes are described in chapter 6 of the textbook
and in the lectures,
but here they are again for completeness.
(You should still read the book and lectures for a more in-depth explanation!)
We will only include passes that you have to implement.
As before, the only files you should modify are the .ml
files
corresponding to these compiler passes.
A. Passes that are unchanged from the previous assignment
These passes are identical except possibly for the names of imported
modules (e.g.
Cloop
changing to Ctup
).
- remove unused blocks
- graph coloring
- remove jumps
B. Passes that have minimal modifications from the previous assignment
These passes have to be extended to handle the new forms, but the extensions should be straightforward.
- shrink
- uniquify
- uncover get!
C. Passes that are new
Expose allocation (expose_allocation.ml
)
This pass is discussed in lecture 16 and in the textbook.
The input language is lfun_shrink
and the output language is lalloc
.
The goal of this pass is to replace the vector
form
(the vector constructor) with three lower-level forms:
collect
which invokes the garbage collector,allocate
which allocates memory on the heap,global_value
which accesses global variables which are used by the garbage collector.
The transformation is done in the function expose_allocation_exp
.
The function recurses through all subexpressions of the program expression,
converting vector
forms as they are encountered.
That's all it does.
The transformation
Here is the transformation, in pseudocode:
(vector (e_0 ... e_{n-1}) (type <type>))
-->
(let (x_0 e_0')
...
(let (x_{n–1} e_{n–1}')
(let (_
(if (< (+ (global-value free_ptr) <nbytes>)
(global-value fromspace_end))
(void)
(collect <nbytes>)))
(let (v (allocate <len> <type>))
(let (_ (vector-set! v 0 x_0))
;; or (vector-set! v 0 e_0) if e_0 is atomic
...
(let (_ (vector-set! v (- n 1) x_{n–1}))
v) )))))
where:
<type>: the type of the vector
<len>: the length of the vector
<nbytes>: the number of bytes needed for a vector of length `len`
x_0 ... x_{n-1}: fresh names
e_0' ... e_{n-1}': (recursively) transformed versions
of non-atomic e_0 ... e{n-1}
Note
In the OCaml code, global_value
is GlobalVal
,
collect
is Collect
, and allocate
is Allocate
.
This transformation is quite complicated, so we're going to discuss it in some detail.
The names x_0
to x_{n-1}
in the initial let
expression are fresh names.
The expressions e_0'
to e_{n-1}
are the arguments of the vector
form,
recursively transformed by the expose_allocation_exp
function.
You only need to let
-bind a new fresh name
for non-atomic subexpressions of the vector
form,
so for a vector of length n
you generally will not have n
nested let
expressions,
since some of the subexpressions will usually be atomic
(variables or literals).
We've supplied a function in the template code called new_var
to generate fresh names for use here.
In contrast,
in the nested let
expressions at the end with the vector-set!
forms,
there do need to be n
such expressions,
since these initialize the vector with its initial values
for each of the n
slots.
If the original expression for a slot was complex,
use the variable bound to (the transformed version of) that expression
(one of the x_i
s);
if it was atomic, just use that expression directly.
For these let
expressions,
bind a dummy name to the vector-set!
expressions;
these names will never be used.
(We call them _
here, but you should generate an actual fresh name.
We used a function called new_void_var
to do this,
which we've supplied in the template code.)
Note
You might wonder why we are using nested let
expressions
at the end instead of a begin
expression.
Both can work, but the nested let
s, somewhat paradoxically,
generate simpler code.
The number of bytes needed for a vector of length n
is just
8 * (n + 1)
i.e. 8 bytes per slot plus 8 bytes for the vector tag.
(Note that we aren't including the size of the contents of the vector
in this calculation, just the size of the vector itself,
since that is what we will need to allocate heap memory for.)
Coding notes
In our code,
we defined a simple helper function called is_atom
which returns true
if an expression is atomic.
We used this to separate out atomic expressions
(which don't need to be let
-bound to fresh names)
from non-atomic expressions (which do).
One useful trick is to convert (map) the expressions
into polymorphic variants which are either
`Atom exp
for atomic expressions`VarExp (fresh_name, exp)
for non-atomic expressions
and then use the resulting list of expressions to generate
the nested let
expressions at the beginning of the form.
Note
This kind of "temporary label" is a very common use case for polymorphic variants in real-world OCaml code.
Sequence of the transformation
We want your code to match the reference code exactly.
The challenge is that this is an imperative pass, like "uniquify",
because you are generating new names.
It's not hard to match the reference code
for a simple pass like "uniquify",
but it's considerably harder here.
To help you with this,
here is a description of the order
in which we implemented the transformation
of expressions of the form Vec (es, t)
:
- Recurse on all the subexpressions
es
. - Generate all the fresh variable names
for non-atomic expressions in
es
. - Generate a fresh name for the result vector.
- Generate the
if
expression. - Generate all the
vector-set!
expressions. - Generate all the nested
(let (_ ...) (vector-set! ...))
expressions which wrap thevector-set!
expressions. - Generate the outermost
(let (_ ...) ...)
expression. - Put everything together.
D. Passes with significant modifications from the previous assignment
1. Remove complex operands (remove_complex.ml
)
In this pass, the input language is lalloc_get
and the output language is lalloc_mon
.
The Collect
, Allocate
, and GlobalVal
forms
are all considered to be complex operands.
Other than that, the code is similar to
the code from the previous assignment.
In rco_atom
, bind Collect
to the dummy name $_
,
since it returns Void
.
That's all there is for this pass!
2. Explicate control (explicate_control.ml
)
In this pass, the input language is lalloc_mon
and the output language is ctup
.
Note that Collect
forms were bound to dummy variables
of the form $_
in the previous pass;
therefore, Collect
forms can be handled in
explicate_assign
as well as in explicate_effect
.
It should never be in tail position, for obvious reasons.
The Allocate
and GlobalVal
forms
are only handled in the convert_exp
function.
Neither form is ever done for effects alone.
The VecRef
case in explicate_pred
is handled
by implementing this conversion:
(if (vector-ref v i) ...)
; -->
(let (vecref_N (vector-ref v i)) ; `vecref_N` is a new name
(if vecref_N ...))
3. Select instructions (select_instructions.ml
)
In this pass, the input language is ctup
and the output language is x86_var_global
.
Compared to the previous assignment's x86_var_loop
language,
the x86_var_global
language has these additions:
-
arg
type: a newGlobalArg
constructor (called simplyGlobal
in the textbook) -
instr
type: new instructions forAndq
(bitwise AND) andSarq
(arithmetic shift right) -
info3
type: a new fieldnum_spilled_root
which contains the number of root stack slots needed to store vector references
A number of new translations of expressions and statements need to be implemented, which we describe next. Please also refer to the textbook (section 6.6).
Collect
Collect
is a statement in ctup
, not an expression.
It has one argument, which is the number of bytes needed.
It compiles to code to call the collect
function of the garbage collector.
This function (written in C in runtime.c
)
takes two arguments: a pointer to the end of the "root stack"
(the extra stack for storing vector pointers)
and the number of bytes needed.
The first (the "root stack pointer") is always stored in register %r15
;
the second is the contents of the Collect
constructor.
Recall that function arguments are placed in the registers
%rdi
, %rsi
, %rdx
, %rcx
, %r8
, %r9
in order;
any additional arguments are passed on the stack.
For the collect
function, we only need %rdi
and %rsi
.
Use movq
instructions to move the arguments to the required registers.
We compile to a callq
instruction (called Callq
in x86_var_global
),
which takes a label and an arity (argument count; here, 2).
The label is just collect
(the name of the function),
but you must call fix_label
on it,
since it's an externally visible label.
Allocate
Allocate
is where vectors actually get allocated.
In ctup
, it's always an expression, not a statement.
There are two forms of this instruction.
One form allocates a vector and assigns the vector to a variable.
The other returns an allocated vector.
The second form is equivalent to assigning the vector to the %rax
register,
and is used when allocating a vector in tail position.
We'll only describe the first form,
since the second form is identical other than assigning to %rax
instead of assigning to a variable.
Note
The translations of GlobalVal
, VecLen
, VecSet
, and VecRef
also have two different forms that are handled in a similar way.
One form assigns the expression to a variable,
while the other is an expression in tail position, where the result
is assigned to the %rax
register.
The translation is (in effect):
<lhs> = (allocate <len> (Vector <type> ...));
-->
movq free_ptr(%rip), %r11
addq 8(<len> + 1), free_ptr(%rip)
movq $<tag>, 0(%r11)
movq %r11, <lhs'>
where:
<lhs>: the `ctup` variable being assigned to
<lhs'>: the `x86_var_global` variable being assigned to
(the `arg` equivalent of the `ctup` `atm` form)
<len>: the length of the vector being allocated
(Vector <type> ...): the type of the vector
<tag>: the vector tag (AKA tuple tag)
To do this translation, we have to compute the vector tag
corresponding to the vector type
(given as (Vector <type> ...)
).
We recommend that you define a helper function called make_tuple_tag
with this signature:
This function should:
- check that the length
len
is between 0 and 50, and raise an exception otherwise - create a string representation of the binary number
corresponding to the tag
(OCaml literal binary numbers start with
0b
) - convert the string to an integer using the function
int_of_string
; this is the tuple tag
The binary number itself contains (in order from left to right):
- some unused bits, which should all be 0s
- 0/1 values corresponding to (up to) 50 slots in the vector; if the type of that slot is itself a vector, use a 1, else use a 0
- 6 bits corresponding to the length of the vector
- 1 bit which is always 1 (the forwarding bit)
Note that the rightmost bits of the pointer bits correspond to the lowest indexed slots of the vector, so the absolute rightmost pointer bit corresponds to slot 0. Unused slots are 0s; you don't have to have them in the bit string since trailing 0s don't change an integer's value. See figure 6.8 in the textbook for a pictorial representation of tuple tags.
One odd thing about the translation given above are the
free_ptr(%rip)
parts.
What we really want is the address of free_ptr
so we can dereference it, store into it, etc.
But in x86-64 assembly language,
we can't directly refer to the address of a global variable.
Instead, we have to refer to it as an offset
from the contents of the instruction pointer register %rip
.
As the textbook explains,
when the assembler sees free_ptr(%rip)
,
it computes the difference d
between the instruction pointer location
and the address of the global variable free_ptr
,
and converts the instruction to d(%rip)
,
which accesses the memory d
bytes away from the current location
of the instruction pointer.
This gives you the address of the free_ptr
global variable,
albeit in a somewhat indirect way.
This addressing mode is called instruction-pointer-relative addressing.
Don't forget to use fix_label
on the free_ptr
label,
since it's the label of a globally visible value
(in this case, a global variable).
GlobalVal
GlobalVal
(in ctup
) is converted to GlobalArg
of a label.
Again, don't forget to use fix_label
on the label
since the label is of a global variable.
When used in an assignment, use movq
to move the global variable contents
to the destination variable;
when used as an expression in a tail context, move it to the %rax
register.
VecRef
, VecSet
and VecLen
These are described well in the textbook (section 6.6).
Note that VecLen
needs to extract the length field from the vector tag;
the andq
and sarq
instructions are used here to accomplish that.
For all three forms, put the vector pointer into the reserved register %r11
.
The %r11
register thus serves as temporary scratch space
for vector pointers.
Note
If your compare tests are failing on the place where you use
andq
and sarq
instructions,
you may need to reverse-engineer our algorithm for computing
vector lengths from the tuple tag.
Hint: we used one andq
instruction
followed by one sarq
instruction.
Note that VecSet
(the ctup
expression)
also exists as VecSetS
(the ctup
statement).
VecSet
always returns Void
,
which is just the immediate integer value 0 in assembly language.
VecSetS
doesn't return anything, but the translation is otherwise the same.
4. Uncover live (uncover_live.ml
)
There are only two modifications you need to make to the "uncover live" pass.
- The
GlobalArg
constructor has to be handled. - The new instructions
andq
andsarq
have to be handled.
The GlobalArg
constructor doesn't actually do anything in this pass
since its argument is a label.
It should be handled where the other arg
values are handled.
The andq
instruction is handled like addq
and subq
.
The sarq
has two arguments (source and destination).
The way we use it, the source argument is always an immediate value,
so you can ignore it in liveness analysis.
The sarq
instruction reads from and writes to the destination argument.
5. Build interference (build_interference.ml
)
This pass requires three changes/additions:
-
The main change in this pass is that the
make_graph
function has to deal with vector-valued variables, which are handled specially. This only affects the handling ofcallq
instructions when calling thecollect
function ("collect"
label). In addition to what was done previously, you need to add an edge between all callee-save registers and all locations in the after set that correspond to call-live vector-valued variables.Note
Edges between caller-save registers and vector-valued variables will already be set up from the code from previous assignments. Given that we are not allowing pointers to call-live vector-valued variables to be stored in any registers or in the regular stack, the only place they can be stored in is in the root stack, which we will deal with in the next section.
-
The
andq
andsarq
instructions need to be processed. This can be done much like the other instructions. -
The destination of an instruction can now be a
GlobalArg
value. These values do not affect the interference graph, so they can be ignored (just return the graph unchanged).
6. Allocate registers (allocate_registers.ml
)
This pass has a number of changes, some of them supplied to you in the template code.
-
Spilled stack locations on the regular stack are handled differently than previously. In addition, we need to manage the root stack, which is where vector-valued variables go. We have two counters for the two stacks, called
next_stack_index
andnext_root_stack_index
. These are both integer references. Before allocating registers, both of these counters are initialized to 1.Note
The reason for the 1-indexing is that two registers are used for the two stacks:
%rbp
for the regular stack and%r15
for the root stack.In the regular stack, the caller's
%rbp
value is stored at0(%rbp)
, so the first available slot is at-8(%rbp)
(the regular stack grows downwards).In the root stack, the %r15 register contains the top of the root stack, and the root stack grows upwards. Despite this, we start assigning slots from
-8(%r15)
, since we don't keep a pointer to the base of the root stack in a register. (In a sense, we are assigning slots in the "wrong order", but it doesn't cause problems.) -
We have a new hashtable called
spilled_location_dict
which is cleared before register allocation. In it, we map integers ("colors" assigned in graph coloring) to spilled locations (regular stack and root stack). -
The
location_of_color
function is the main thing that needs to be changed for this pass.-
As before, if the color (represented as an integer) is in the color register map, return the register that the color maps to. Otherwise, the color represents a spilled variable.
-
This function can spill variables to either the regular stack or to the root stack. The
vector_typed
argument is used to select the stack; if it'strue
, spill to the root stack. -
Check to see if the color is in the
spilled_location_dict
; if it is, return that location. Otherwise, continue. -
Note that the colors of variables that represent vectors can be interleaved with the colors of variables that don't. Therefore, you can't just take the difference between the color and the last register color to determine where on the stack to spill the variable to. Instead, use the
next_stack_index
variable (for non-vector-typed variables) andnext_root_stack_index
variable (for vector-typed variables) to determine the stack slot for the respective slots. (Increment the index you pick after you use it, so the next variable will get a different slot.) The offset is (-8) times the index, and the base register isRbp
for the regular stack orR15
for the root stack. This gives you the location you need. -
Once you've done this, add the color/location pair to the
spilled_location_dict
in case the location of the same color is requested later.
-
-
The
varmap_of_colormap
function has to be modified to calllocation_of_color
with the newvector_typed
argument. It does this by seeing if a variable is in thevvs
(vector-valued variables) set. -
The
get_variable_location_map
function (supplied to you) now re-initializes the stack index variables and the spilled location dictionary. -
The
convert_instr
function needs to be extended to handle theandq
andsarq
instructions. -
The
get_num_spilled
function is now supplied to you. This function is trivial because we can read off the number of spilled variables on both the regular and root stacks from thenext_stack_index
andnext_root_stack_index
variables. -
The new
vector_vars
function (supplied to you) returns a set of all the vector-valued variables. -
The
allocate_registers
function now computes both the number of variables spilled to the regular stack and the number of variables spilled to the root stack. This information is stored in theinfo3
datatype.
7. Patch instructions (patch_instructions.ml
)
Only a couple of modifications need to be made for this pass:
-
When patching instructions, many instructions can't have both arguments as stack locations. This also applies to global variable locations (
GlobalArg
arguments), as well as any combination of those and stack locations. -
The new
andq
andsarq
instructions also need to be patched.
8. Prelude and conclusion (prelude_conclusion.ml
)
There are a few additions needed in this pass.
-
When converting global labels (
GlobalArg
arguments), use instruction-pointer-relative addressing (discussed above), soGlobalArg lbl
becomesGlobalArg (Rip, lbl)
. -
Handle the new
andq
andsarq
instructions. -
In the prelude, do the following extra tasks:
-
The garbage collector needs to be initialized by a call to the
initialize
function (don't forget to usefix_label
on the function labels!). This function takes two arguments: the root stack size and the heap size. Set them both to be the value of theinit_heap_size
variable. Recall that the first two arguments to a function are passed in registers%rdi
and%rsi
, after which acallq
to the function label should be added. -
In the prelude, the address of the
rootstack_begin
global variable (again, usefix_label
for global variable labels) should be copied to register%r15
. Use instruction-pointer-relative addressing. -
In the prelude, all the root stack slots should be initialized to 0. This should use the
%r15
register, which currently points to the beginning of the root stack. -
The address of the end of the root stack should now be copied into the
%r15
register.
-
-
In the conclusion, reset the
%r15
register to the beginning of the root stack space. Do this by subtracting the root stack space used from%r15
.
Figure 6.16 in the textbook gives you an idea of how this should look for a program which needs a single root stack slot and no regular stack slots.