Assignment 6: Compiler passes
The compiler passes are described in chapter 7 of the textbook
and in the lectures,
but here they are again for completeness.
(You should still read the book and lectures for a more in-depth explanation!)
We will only include passes that you have to implement.
As before, the only files you should modify are the .ml
files
corresponding to these compiler passes.
Overview
The fundamental difference between this compiler
and the previous compilers is that
this compiler has to compile multiple top-level functions,
whereas all the previous compilers
only had to compile a single top-level function (main
).
Therefore, even compiler passes with no significant changes
have to be applied over each top-level function.
The code to do this has been supplied in each pass,
so you can concentrate on the substantive differences
between this compiler and the previous one.
In the descriptions below,
when we say that pass X
is unchanged
from the pass in the previous compiler, we mean
"except for having to compile multiple top-level functions
instead of just one".
A. Passes that are unchanged from the previous assignment
These passes are identical other than
the names of imported modules (e.g.
Ctup
changing to Cfun
),
and having to apply the conversion over multiple top-level functions.
- remove unused blocks
- graph coloring
B. Passes that have minimal modifications from the previous assignment
These passes have to be extended to handle the new forms, but the extensions should be straightforward.
- uncover get!
- allocate registers
C. Passes that are new
Reveal functions (reveal_functions.ml
)
The point of this pass is to replace any variable
which is a function name with a FunRef
form
containing the name and the function arity.
The supplied code makes an int VarMap.t
map called fmap
which maps all the top-level function names to their arities.
Your job is to fill in the reveal_functions_exp
function,
which uses this map to identify
variables which correspond to function names,
and then replace those Var
forms with FunRef
forms.
Var
forms that don't involve function names are left alone.
This pass is straightforward.
Note that we place this pass after uniquify
so that there can't any local variable names or function argument names
which have the same name as any of the top-level function names,
since all the local variable names
and argument names have been "uniquified",
but function names are left alone.
Limit functions ("limit_functions.ml")
We don't want to use the stack for argument passing in the event that our functions have more than six arguments, because this makes tail calls much harder to implement. What we do instead is transform the code so that any function that has more than six arguments will have exactly six arguments, and the last argument will be a tuple (vector) of all but the first 5 original arguments. This affects both function definitions and function calls.
A function definition is transformed like this:
(define f ([x_1 : ty_1] ... [x_n : ty_n]) ty_ret body)
; -->
(define f ([x_1 : ty_1] ... [x_5 : ty_5]
[rest : (vector ty_6 ... ty_n)]) ty_ret body')
where body'
is like body
, but any x_i
after x_5
becomes a tuple access to tup
.
So x_6
becomes (vector-ref rest 0)
,
x_7
becomes (vector-ref rest 1)
, etc.
When calling a function with more than 6 arguments, pack the extra arguments into a vector:
We do this transformation only if there are more than six arguments. If there are exactly six arguments, there is no need to do the transformation.
We also have to change the arity of all FunRef
s
to limit the arity to at most 6.
Utility function: convert_exp
The "limit functions" pass is a bit unusual
in that it outputs a program
with the same datatype as its input.
("Uniquify" is another pass like this.)
We are providing a utility function
called convert_exp
in lfun_refs.ml
which will apply an exp -> exp
transformation
over each subexpression in an expression
as well as the expression itself.
We encourage you to use this function
(though it isn't required)
because it can shorten the pass quite a bit.
We use it in limit_functions_exp
and fix_fun_refs
.
There are OCaml extensions that allow you
to automatically generate functions like convert_exp
,
but for simplicity, we aren't using them in this course.
Also, don't confuse this with the convert_exp
function
in explicate_control
!
That is a different function.
Coding notes
We've broken down this pass into a number of functions that you should fill in:
limit_type
limit_args
limit_functions_exp
get_extra_args
fix_fun_refs
The comments before each function explain what the function does.
For each user function,
we compute a record of the extra argument information.
This record has the type extra_args
,
which is defined in the file.
It contains the name of the new tuple
that will hold the extra arguments,
the number of extra arguments,
and a VarMap
which maps the original argument name
to the index in the new tuple.
To generate the names for the new tuple,
use the supplied new_tup_var
function.
If there are no extra arguments,
we return the no_extra_args
record.
We store all the names of "limited" functions
(those that have had their argument counts changed)
in a variable called limited_names
.
This has the type VarSet.t ref
,
which means that it is a reference to a set of variables.
In other words, it's a mutable set.
We use this later in the fix_fun_refs
function
to change the arities of FunRef
s for limited functions.
The library functions take
and drop
(from Support.Utils
)
will be useful in this pass.
take n lst
returns a list
which is a copy of the first n
elements of the list lst
,
while drop n lst
returns a list
which contains all the elements of lst
except for the first n
elements.
Optimize (optimize.ml
)
The "optimize" pass is a new pass which will be described in detail below.
D. Passes with significant modifications from the previous assignment
Here we describe significant changes in some passes. All passes have to be extended to handle new constructors (if any) and multiple top-level functions; what we are describing here is everything else that also needs to change.
1. Shrink (shrink.ml
)
There are two significant changes in the shrink
pass.
The first is that the final expression in the program
is converted into the body of the main
function.
This function takes no arguments and return an integer.
This conversion is provided in the code supplied to you.
The second is that we do not convert the type argument of vectors
from a type option
to a type
,
as we did in the previous compiler.
Even though vectors should be typed at this point
(and you should signal an error if they are not),
the type field still has the type type option
.
This is necessary because we create new vectors
in the "limit functions" pass
which are initially untyped.
(They get typed in a subsequent type checking pass.)
2. Uniquify (uniquify.ml
)
The only significant change here is that the uniquify_exp
function
now takes a name-to-name mapping (a var VarMap.t
) as an extra argument.
This mapping is created in the (new) uniquify_def
function,
and includes the names of the function's arguments,
since they will usually show up in the body of the function.
3. Expose allocation (expose_allocation.ml
)
Most of the "expose allocation" pass is identical to the previous compiler.
The one significant difference is that in the input language (lfun_ref
),
Vec
forms contain a ty option
field instead of a ty
field,
even though they should be typed at this point.
In the previous compiler,
Vec
forms contained a ty
field, not a ty option
field.
Therefore, when processing a Vec
form,
signal an error if the ty option
field is None
.
4. Remove complex operands (remove_complex.ml
)
There are two new forms that need to be handled here:
FunRef
and Apply
.
Both are classified as complex expressions
(as explained in section 7.6 of the textbook).
On the other hand,
all the arguments of Apply
must be atomic expressions.
5. Explicate control (explicate_control.ml
)
In this pass, the input language is lfun_ref_mon
and the output language is cfun
.
The cfun
language extends the ctup
language of assignment 5
by adding several new forms:
- expressions:
FunRef
andCall
- statements:
CallS
- tails:
TailCall
In addition, there is the fun_contents
record type
which contains all information for a function
(all types, argument names, local variable names, and the function body).
The changes that need to be made in this pass are mostly trivial. Non-trivial or non-obvious changes include:
-
convert_exp
:Apply
becomes aCall
. -
explicate_pred
:FunRef
are not handled since they can't be booleans. TheApply
case is handled like aVecRef
; implement this conversion: -
explicate_effect
:Apply
becomes aCallS
; otherwise, this case is handled like other cases. -
explicate_tail
:Apply
becomes aTailCall
. -
create_block
: add a case forTailCall
.
6. Select instructions (select_instructions.ml
)
In this pass, the input language is cfun
and the output language is x86_var_def
.
There are a number of significant changes and additions in this pass.
(See section 7.8 in the textbook for a detailed walkthrough.)
For a function named FUNC
, start
blocks become FUNC_start
,
and conclusion
blocks become FUNC_conclusion
.
This is needed because there are multiple functions,
each with its own start
and conclusion
block.
Also, before the first instruction in the FUNC_start
block,
the function arguments must be moved from their designated registers
(the argument passing registers:
%rdi
for argument 1, %rsi
for argument 2, etc.)
into the function argument names.
We do this in the add_arg_instrs
function.
A number of new forms need to be handled, which we will illustrate schematically.
-
Assigning a
FunRef
to a variable: -
Assigning a function call to a variable:
-
A non-returning function call (
CallS
): same as the previous case without the finalmovq
instruction. -
Returning a
FunRef
: -
Returning a function call should be signalled as an error, since this should have been converted to a
TailCall
previously. -
A
TailCall
:
Don't forget to use fix_label
on function names
when outputting Callq
or GlobalArg
instructions.
7. Uncover live (uncover_live.ml
)
This pass is the same as the previous compiler, except for the new instructions:
-
Leaq
writes to the destination register. The label is irrelevant to liveness analysis. -
IndirectCallq
andTailJmp
are likeCallq
except that they also read from the argument (register) that stores the function address.
8. Build interference (build_interference.ml
)
In this compiler, we build an interference graph for each function.
Other than that,
the only significant change to this pass
is in the handling of Callq
and IndirectCallq
instructions.
In the previous compiler,
we ensured that call-live tuple-typed variables
were not stored in registers during a garbage collection
(so they can be stored in the root stack).
We did this by adding interference edges
between such variables and all registers (caller-saved and callee-saved).
Note that calls to built-in functions other than collect
do not have this requirement.
This is still true, but in addition,
we need to add these interference edges
for tuple-typed variables that are call-live
during calls to user-defined functions, which use IndirectCallq
.
(We never use Callq
for user-defined functions,
only for built-in functions.)
The reason we do this is that we want to ensure that
tuple-typed variables are on the root stack
during every garbage collection.
A call to collect
triggers a garbage collection,
and a call to a user-defined function
may trigger a garbage collection.
To be conservative, we assume that it will.
Note
A more sophisticated compiler could analyze functions to see if it could prove that no garbage collection was possible when calling that function; if so, it would not have to create interference edges between call-live tuple-typed variables and all registers when calling that function.
9. Remove jumps (remove_jumps.ml
)
Since there are multiple "conclusion" blocks now
(one for each function),
you can't just remove the block labelled conclusion
when considering which blocks to merge;
you have to remove the block labelled NAME_conclusion
where NAME
is the name of the function.
Other than this, the pass is basically unchanged.
10. Patch instructions (patch_instructions.ml
)
Aside from obvious changes, the only things you need to make sure of in this pass are:
-
Leaq
instruction: the second (destination) argument must be a register. -
TailJmp
instruction: the source register should be%rax
. The reason for this is discussed in the textbook (section 7.2.2).
11. Prelude and conclusion (prelude_conclusion.ml
)
This pass is very well described in the textbook (section 7.11),
so we refer you to that.
Be sure not to forget to add the conclusion code before each tail call.
Recall that tail calls are indirect jump (TailJmp
) instructions.
Also, unlike in the textbook,
but like our previous compilers,
this pass doesn't generate raw assembly code
but abstract assembly code (see x86_asm.mli
).
This will turn out to be very handy in the final optimization pass,
which we turn to now.
E. Peephole optimizations: the "optimize" pass (optimize.ml
)
This pass isn't in the textbook. It implements various peephole optimizations.
A "peephole optimization" is an optimization that can be done on a small number of nearby (usually adjacent) instructions. The name "peephole" suggests that you are looking at the list of instructions through a narrow peephole so you can only see a few at one time. Despite this, there are many cases in which you can replace or remove some of these instructions to get equivalent ones that do the same thing. So the idea is to
- remove instructions that aren't needed (reducing code size)
- replace instructions with other instructions that are more efficient
In this pass, you will be working with
the raw assembly language instructions
generated by the "prelude and conclusion" pass.
These instructions are represented in the x86_asm
language,
which is just raw x86-64 instructions
represented as an OCaml datatype.
Although there are many possible peephole optimizations, we will only be doing two kinds.
Removing instructions with no effect
Some instructions can't possibly have any effect, and can therefore be removed. We remove the following instructions:
-
Adding
0
to an integer-valued register/stack location. -
Subtracting
0
from an integer-valued register/stack location.Note that
subq %rcx, $0
can't be removed, since it negates the contents of the%rcx
register. -
Self-moves: moving a register/stack location to itself.
This last case has been handled earlier, so it will probably not be necessary, but it's trivial to implement, so we do it.
Trimming reciprocal moves and identical moves
In addition to self-moves, sometimes you have a pair of instructions of the form
It's clear that the second instruction can't have any effect, since after the first instruction, the two locations have the same value. So we can remove the second instruction. (We still have to keep the first instruction.)
We also remove consecutive identical moves:
This should work no matter how many consecutive identical moves there are.
Additional optimizations
If you think of extra peephole optimizations that could be implemented, please let us know! But please don't implement them in your compiler, or your output will differ from the course compiler's output.