Assignment 2: Compiler passes
The compiler passes are described in chapter 3 of the textbook,
but here they are again for completeness. We will only include
passes that you have to implement.
As before, the only files you should modify are the .ml
files
corresponding to these compiler passes.
In addition,
when we provide a function stub in a .ml
file that you need to complete,
that means (unless otherwise specified)
that we expect that you will use that function
(with those arguments and types (if supplied)) as written
(filling in the TODO
parts, of course).
In particular, you're not allowed to change
the number of arguments to the function, or their types (if supplied).
If a function is completely implemented (no TODO
s),
you should leave it as-is.
On the other hand, you can write as many extra functions as you like.
(If we don't like your choices, we'll let you know during code reviews!)
Note
The one exception to this rule is in the graph_coloring.ml
file.
We provide function stubs with TODO
s as usual,
as well as an implementation of the color
function that uses
those functions, but if you don't like this,
you are allowed to completely rewrite the entire module,
as long as you end up with a color
function
with the correct type signature which works correctly.
Passes that are unchanged from assignment 1
Since the focus of this assignment is on register allocation,
those passes that come before register allocation
do not have to be modified from assignment 1
(in your ch2/
directory). This includes:
uniquify.ml
remove_complex.ml
explicate_control.ml
Note that you are free to edit your old code, in case you think you have a better way of doing things than what you wrote previously, or in case you found bugs in those passes that you need to fix.
1. Select instructions (select_instructions.ml
)
The "select instructions" pass has no fundamental changes, but there are some
trivial changes because the X86Program
type in x86_var.ml[i]
has
changed. The new version has different info types for the program and also has
block info types. For this pass, the block info is a placeholder type
(containing no information) called binfo1
.
(This will change in later passes.)
Some of the type annotations will need to be changed slightly,
and the blocks will all contain a Binfo1
constructor.
The type checker should point out
exactly where your old code needs to be changed.
2. Uncover live (uncover_live.ml
)
The "uncover live" pass has to implement the function
uncover_live
with this signature:
val uncover_live :
(X86_var.info1, X86_var.binfo1) X86_var.program ->
(X86_var.info1, X86_var.binfo2) X86_var.program
The X86_var.program
type is parameterized
on two kinds of "info" types:
one for the program as a whole
(with variants info1
, info2
, and info3
),
and one for blocks
(with variants binfo1
and binfo2
).
The info1
and binfo1
types are placeholder types,
containing no data.
All that this pass does is compute the binfo2
value,
which contains the liveness information described in
the textbook (section 3.2) and in the lectures.
The instructions should not be changed.
The binfo2
type has this definition:
type live =
{
initial : LocSet.t; (* initial live variables/registers *)
afters : LocSet.t list; (* live vars/regs after each instruction *)
}
type binfo2 = Binfo2 of live
The liveness data is represented as a list of sets of locations,
where a "location" is a variable, a register, or a stack location.
Sets of locations are represented by the type LocSet.t
.
Since liveness sets apply between instructions,
we use the initial
field to represent the initial set of live locations,
and the afters
field represents, for each instruction,
the set of live locations after that instruction.
Note
In principle, we could have added the liveness data directly
to the instructions, but we prefer to keep it separate,
partly because it's only used in this pass.
It's important to be aware that there is a one-to-one correspondence
between the instructions in the program
and the after
sets of the live
record:
the Nth item in the after
sets is the after set
for the Nth instruction in the program.
We provide an implementation for the uncover_live
function,
which calls the function uncover_live_in_block
to compute the liveness sets. You have to implement that function.
The inputs to that function are the list of instructions in the block,
and live_before_labels
, which is a map between block names
and the live locations before that block executes.
Here, there is only one such block: conclusion
,
with live locations (registers) %rax
and %rsp
.
Note
Recall that we compute liveness backwards,
from the last instruction back towards the first.
The live locations before the conclusion
block
constitute the live-after set of the last instruction
in the code you compiled.
You can implement uncover_live_in_block
as you like
(as long as it's functional — don't use mutation!)
but we recommend that you define a helper function
to compute the liveness set for a single instruction
given the instruction and its live-after set.
Also, be aware that callq
instructions require special handling.
First, executing a callq
instruction can result in
any of the caller-saved registers being overwritten.
There is a list of caller-saved registers in the Types
module as
caller_saved_regs
; don't hard-code the caller-saved registers!
Second, callq
instructions with arity (number of arguments)
greater than 0 have to read from registers for the first 6 arguments.
These registers are given in the Types
module as arg_passing_regs
;
these arguments are (in order)
%rdi
, %rsi
, %rdx
, %rcx
, %r8
, and %r9
(all of which are caller-saved registers, which should make sense).
We recommend that you write a helper function
to determine the change in the liveness set just for callq
instructions,
incorporating all of this information.
Note
Basically, for a callq
instruction,
all the caller-saved registers get added to the write set,
and the arg-passing registers needed
(but only those that are needed)
get added to the read set.
The algorithm for computing liveness sets is described in detail in section 3.2 of the textbook and in the lectures.
3. Build interference graph (build_interference.ml
)
The "build interference" pass has to implement the function
build_interference
with this signature:
val build_interference :
(X86_var.info1, X86_var.binfo2) X86_var.program ->
(X86_var.info2, X86_var.binfo1) X86_var.program
From this, we can see that the block info binfo2
value
(containing the liveness data)
goes back to the placeholder binfo1
type,
but the program info info1
value
(containing nothing but the types of variables)
gets extended into the info2
type,
which has this definition:
What this does is add the interference graph information
(as the conflicts
field) to the program.
Note
You might wonder why the liveness data is stored
in a block info type while the interference graph
is stored in a program info type.
The liveness data is stored in a block-by-block manner,
while there is only one interference graph per function.
(Here, our programs are a single main
function.)
Note that most functions will have multiple blocks.
Ours also do (the start
block plus the prelude and conclusion)
but we don't have to compute the liveness data for the
prelude and conclusion because they are so simple.
In later compilers, our programs will have many more blocks
to handle conditionals, loops and user-defined functions.
We supplied an implementation of the build_interference
function
which calls the make_graph
function to compute the interference graph.
You need to implement make_graph
, following the algorithm given in
section 3.3 of the textbook and in the lectures.
You'll almost certainly want to split this function up into
multiple helper functions.
Pay particular attention to movq
and callq
instructions;
remember that for callq
you need to create an edge
between all the locations in the live-after set and all
caller-save registers.
Note
This is unrelated to the handling of caller-save registers in liveness analysis, which normally does nothing but is included for completeness. Here, creating the edges between call-live variables and caller-save registers is extremely important, because it prevents call-live variables from being placed into caller-save registers, where they would have to be saved and restored before and after the call. Thus, they will normally be put into callee-save registers, unless there are none available, in which case they will be placed onto the stack. The register allocation algorithm handles all of this automatically!
You might wonder if instead of spilling a call-live variable to the stack, it might be better to just put it in a caller-save register (assuming there is one) and pay the price of saving and restoring. This is a fine question, and one that preoccupies people writing optimizing compilers, but our goals are less lofty. We just want to generate reasonably good code, not the maximally optimal code.
Please review section 3.3 of the textbook before attempting this pass, since it has detailed rules for which instructions give rise to which edges in the interference graph. Also, recall that the way we are building the interference graph is to focus on writes, because the writes performed by an instruction must not overwrite something in a live location. As the book says:
So for each instruction, we create an edge between the locations being written to and the live locations. (However, a location never interferes with itself.)
One consequence of this is that instructions that don't write to a live location don't change the interference graph.
4. Graph coloring (graph_coloring.ml
)
This module is not technically a compiler pass, but it's an integral part of
the "allocate registers" pass. However, the algorithm is more general, and the
implementation is not specialized to locations (registers, variables and stack
locations) until the GraphColor
functor (defined in this module) is
specialized to locations in allocate_registers.ml
. The algorithm is
described in some detail in section 3.4 of the textbook as well as in the
lectures.
The GraphColor
functor is parameterized on several modules:
-
a module defining the graph element type (which will be locations in
allocate_registers.ml
) -
a module defining the element map type (made from the
MapS
functor specialized on element keys);MapS
is just an extension of the normal OCamlMap
functor and is defined insupport/functors.ml[i]
-
a priority queue module (defined in
support/priority_queue.ml[i]
) -
an undirected graph module (the interference graph; defined in
support/ugraph.ml[i]
)
The functor is instantiated in allocate_registers.ml
, so you don't have to
worry about how to do that. Inside the functor, a type node
is defined
that contains information about the elements (locations) which isn't stored
in the interference graph, and that changes as the algorithm progresses.
The two components of a node
are the color of the element
(an int option
, since the element is initially uncolored),
and the saturation set
(a set of integers, representing the colors that the element is not
allowed to have). As the algorithm progresses, the colors (which mostly
start as None
) get filled in, and the saturation sets increase in size,
which constrains the color choices.
We define a nodemap
type which is a map between elements and node
s.
We also provide a debugging function called _print_nodemap
which can print out the nodemap at any point in the algorithm
to help debugging.
Note
The name is _print_nodemap
instead of print_nodemap
because it isn't exported from the functor,
and OCaml issues a warning about unused functions
if their names don't start with an underscore.
The only function that needs to be implemented is the color
function,
which takes an interference graph and a map of precolored elements
(an element-to-int
map) as its arguments.
Recall that the only precolored elements we are using
are the reserved registers,
which for this compiler are %rax
(color -1
), and %rsp
(color -2
).
The return value of the function is the full element-to-int
map,
containing color assignments for all variables.
Because this is a somewhat tricky algorithm,
we've provided you with a starting point.
We've given our version of the color
function,
which calls a number of helper functions.
You may choose to simply implement all the helper functions in
graph_coloring.ml
.
Or, if you would prefer to implement the color
function from scratch,
you are free to delete any or all of our helper functions and
rewrite the color
function accordingly.
(Don't change its type, though!)
Note
The only restriction we'll insist on is that your code for this module
should be purely functional.
Don't use ref
cells or any other imperative features.
In a real-world setting, you might choose to use imperative code
for efficiency or convenience, but it's definitely not necessary.
Also, functional code is easier to get right.
The total amount of code you need to write is less than 100 extra lines.
Testing the graph coloring algorithm
There is a very trivial test script for the graph coloring algorithm only
in the file test_graph_coloring.ml
.
You can run it in the interactive interpreter as follows:
This will run the test and print TEST PASSED!
if it passes
or TEST FAILED!
if not.
More importantly, since this is done interactively,
you can inspect the results of graph coloring to see how they
differed from the expected results.
The example given is the same one given in the book and in the lectures.
Note
In future, we plan to have much more comprehensive tests of this algorithm. If you are looking for a topic for a CS 81 project, this course is a gold mine! In addition to more tests, there are lots of features we want to add.
5. Allocate registers (allocate_registers.ml
)
Once you've implemented the graph coloring algorithm, the "allocate registers" pass is simple. You need to do these things:
-
Run the graph coloring algorithm and assign registers (or, if there are no available registers, stack locations) to all variables. This is done in the functions
get_variable_location_map
, which is supplied to you, andvarmap_of_colormap
, which you need to fill in. Your implementation ofvarmap_of_colormap
should use the helper functionlocation_of_color
, which you also need to fill in.Note
The
location_of_color
function should make use of thelast_register_color
variable. This gives the index of the last register that can be used for register allocation. It's set in theset_register_color_list
function, and will correctly track the-regs
argument of the compiler. Don't use the size ofregister_color_list
or of the maps computed by it, since they also contain the reserved registers (with negative color numbers).Also note that
last_register_color
is anint ref
, not anint
, so you will need to dereference it to get its value. Please don't set it, though! -
Use the mapping between variable bindings and locations (registers or stack locations) to determine the number of stack slots needed, if any. This is done in the function
get_num_spilled
, which you need to fill in. Be careful that you don't count a stack location more than once! Hint: sets, specificallyLocSet
s, are your friend! -
Use the same mapping between variable bindings and locations to collect a set of all the callee-save registers used in the program. This is done in the function
get_used_callee
, which you need to fill in. -
Convert all the instructions that use variables to use the corresponding registers or stack locations. This is done in the function
convert_block
, which you need to fill in. You'll probably want to define some helper functions in your implementation of this function.
All the other code for this pass has been supplied.
In particular, code for selecting which registers are available
(based ultimately on the command-line arguments
to the compile
program) is supplied and should not be altered.
6. Patch instructions (patch_instructions.ml
)
Aside from trivial type changes, the only significant change in this pass
is that instructions of the form Movq (x, x)
(moving a value from a location to the same location) are removed.
7. Prelude and conclusion (prelude_conclusion.ml
)
This pass has a few changes from the corresponding pass in the last assignment.
The info
record type attached to the program
type in x86_int.mli
has these fields:
We won't be concerned with locals_types
, but the other two are relevant
to generating a correct prelude and conclusion.
num_spilled
is the number of stack slots needed for variables
that were not placed in registers.
used_callee
are the names of the callee-save registers that are used
in the program.
Recall that the program is one big function (the main
function),
and so the values of any callee-saved variables that are used inside
this function must be saved in the prelude and restored in the conclusion.
They will be saved to the stack and restored from the stack.
Therefore, the total number of stack slots needed can be more than
num_spilled
; it's actually num_spilled
plus the number of
callee-save registers used.
And to make it even more confusing, the size of the stack frame
(the total amount of stack space reserved for the function)
must be a multiple of 16 bytes, so it can be even bigger!
Fortunately, we have defined a function called align_16
in the
Utils
module of the support
directory. This will take a
non-negative integer and round it up to the nearest multiple of 16.
So the total stack space required is actually (in pseudocode):
However, it's still a bit more complicated than this.
We will be saving the callee-save registers by pushing them to the stack
with the pushq
instruction,
and restoring them with the popq
instruction.
The pushq
instruction subtracts 8 bytes from the stack pointer,
and the popq
adds it back.
The effect of this is that, after all the callee-save registers
have been pushed to the stack using pushq
instructions,
we've already allocated a big chunk of the total stack space required.
The additional stack space we need to allocate can be computed from these
equations (in pseudocode):
n_callees = size(used_callees)
extra_stack_space = align_16 (8 * (num_spilled + n_callees)) - 8 * n_callees
This is already computed for you in the prelude_conclusion
function.
From here, you have to do two things:
-
Compute the prelude and conclusion (collectively called the "epilog" in the code because we place it after the code in the
start
block). This is done in theepilog
function which takes the set of callee-save registers and the amount of extra stack space needed as arguments. -
Adjust all instructions in the
start
block that access stack locations, because the callee-save registers on the stack use up space at the beginning of the stack frame, which means that any stack location access which is relative to the base pointer in%rbp
has to be changed to account for this extra space. This is done in theasm_of_lb
function, which takes a labelled block (thestart
block, in this case) and the amount of space that stack locations need to be shifted by (calledderef_adjust
) as its arguments. You'll probably want to define helper functions to write this function, as (we hope) you did in the last assignment.
The prelude_conclusion
function appends the epilog to the start
block
code to generate the final program.
Here are a couple of notes about the functions you need to write.
-
For
asm_of_lb
, all you need to do is go through all instructions, changing any stack location accesses by subtracting thederef_adjust
value. -
For
epilog
, after you've established the base pointer in the prelude, you need to push the callee-save register values onto the stack, and then adjust the stack pointer by subtracting the extra space needed for the stack-resident variables before jumping to thestart
block. (Recall that the stack grows downwards, so subtracting from the stack pointer reserves stack space.) In the conclusion, you have to add the extra space for the stack-resident variables back to the stack (reclaiming the stack space), and then pop the callee-save register values back to their registers before resetting the base pointer and returning. Make sure that you pop the callee-save registers in reverse order to the order in which you pushed them! (Remember, the stack is a last-in, first out (LIFO) data structure.) Also, only push and pop the callee-save registers you actually used; if your program didn't use any, you don't have to do any pushing and popping.
This sounds much more complicated than it is! This pass is really just a little bit of (somewhat annoying) bookkeeping.