Skip to content

Assignment 6: Compiler passes

The compiler passes are described in chapter 7 of the textbook and in the lectures, but here they are again for completeness. (You should still read the book and lectures for a more in-depth explanation!) We will only include passes that you have to implement. As before, the only files you should modify are the .ml files corresponding to these compiler passes.

Overview

The fundamental difference between this compiler and the previous compilers is that this compiler has to compile multiple top-level functions, whereas all the previous compilers only had to compile a single top-level function (main). Therefore, even compiler passes with no significant changes have to be applied over each top-level function. The code to do this has been supplied in each pass, so you can concentrate on the substantive differences between this compiler and the previous one.

In the descriptions below, when we say that pass X is unchanged from the pass in the previous compiler, we mean "except for having to compile multiple top-level functions instead of just one".

A. Passes that are unchanged from the previous assignment

These passes are identical other than the names of imported modules (e.g. Ctup changing to Cfun), and having to apply the conversion over multiple top-level functions.

  • remove unused blocks
  • graph coloring

B. Passes that have minimal modifications from the previous assignment

These passes have to be extended to handle the new forms, but the extensions should be straightforward.

  • uncover get!
  • allocate registers

C. Passes that are new

Reveal functions (reveal_functions.ml)

The point of this pass is to replace any variable which is a function name with a FunRef form containing the name and the function arity. The supplied code makes an int VarMap.t map called fmap which maps all the top-level function names to their arities. Your job is to fill in the reveal_functions_exp function, which uses this map to identify variables which correspond to function names, and then replace those Var forms with FunRef forms. Var forms that don't involve function names are left alone. This pass is straightforward.

Note that we place this pass after uniquify so that there can't any local variable names or function argument names which have the same name as any of the top-level function names, since all the local variable names and argument names have been "uniquified", but function names are left alone.

Limit functions ("limit_functions.ml")

We don't want to use the stack for argument passing in the event that our functions have more than six arguments, because this makes tail calls much harder to implement. What we do instead is transform the code so that any function that has more than six arguments will have exactly six arguments, and the last argument will be a tuple (vector) of all but the first 5 original arguments. This affects both function definitions and function calls.

A function definition is transformed like this:

(define f ([x_1 : ty_1] ... [x_n : ty_n]) ty_ret body)
; -->
(define f ([x_1 : ty_1] ... [x_5 : ty_5]
           [rest : (vector ty_6 ... ty_n)]) ty_ret body')

where body' is like body, but any x_i after x_5 becomes a tuple access to tup. So x_6 becomes (vector-ref rest 0), x_7 becomes (vector-ref rest 1), etc.

When calling a function with more than 6 arguments, pack the extra arguments into a vector:

(f x_1 ... x_n) 
; -->
(f x_1 ... x_5 (vector x_6 ... x_n)

We do this transformation only if there are more than six arguments. If there are exactly six arguments, there is no need to do the transformation.

We also have to change the arity of all FunRefs to limit the arity to at most 6.

Utility function: convert_exp

The "limit functions" pass is a bit unusual in that it outputs a program with the same datatype as its input. ("Uniquify" is another pass like this.) We are providing a utility function called convert_exp in lfun_refs.ml which will apply an exp -> exp transformation over each subexpression in an expression as well as the expression itself. We encourage you to use this function (though it isn't required) because it can shorten the pass quite a bit. We use it in limit_functions_exp and fix_fun_refs.

There are OCaml extensions that allow you to automatically generate functions like convert_exp, but for simplicity, we aren't using them in this course.

Also, don't confuse this with the convert_exp function in explicate_control! That is a different function.

Coding notes

We've broken down this pass into a number of functions that you should fill in:

  • limit_type
  • limit_args
  • limit_functions_exp
  • get_extra_args
  • fix_fun_refs

The comments before each function explain what the function does.

For each user function, we compute a record of the extra argument information. This record has the type extra_args, which is defined in the file. It contains the name of the new tuple that will hold the extra arguments, the number of extra arguments, and a VarMap which maps the original argument name to the index in the new tuple. To generate the names for the new tuple, use the supplied new_tup_var function. If there are no extra arguments, we return the no_extra_args record.

We store all the names of "limited" functions (those that have had their argument counts changed) in a variable called limited_names. This has the type VarSet.t ref, which means that it is a reference to a set of variables. In other words, it's a mutable set. We use this later in the fix_fun_refs function to change the arities of FunRefs for limited functions.

The library functions take and drop (from Support.Utils) will be useful in this pass. take n lst returns a list which is a copy of the first n elements of the list lst, while drop n lst returns a list which contains all the elements of lst except for the first n elements.

Optimize (optimize.ml)

The "optimize" pass is a new pass which will be described in detail below.

D. Passes with significant modifications from the previous assignment

Here we describe significant changes in some passes. All passes have to be extended to handle new constructors (if any) and multiple top-level functions; what we are describing here is everything else that also needs to change.

1. Shrink (shrink.ml)

There are two significant changes in the shrink pass.

The first is that the final expression in the program is converted into the body of the main function. This function takes no arguments and return an integer. This conversion is provided in the code supplied to you.

The second is that we do not convert the type argument of vectors from a type option to a type, as we did in the previous compiler. Even though vectors should be typed at this point (and you should signal an error if they are not), the type field still has the type type option. This is necessary because we create new vectors in the "limit functions" pass which are initially untyped. (They get typed in a subsequent type checking pass.)

2. Uniquify (uniquify.ml)

The only significant change here is that the uniquify_exp function now takes a name-to-name mapping (a var VarMap.t) as an extra argument. This mapping is created in the (new) uniquify_def function, and includes the names of the function's arguments, since they will usually show up in the body of the function.

3. Expose allocation (expose_allocation.ml)

Most of the "expose allocation" pass is identical to the previous compiler. The one significant difference is that in the input language (lfun_ref), Vec forms contain a ty option field instead of a ty field, even though they should be typed at this point. In the previous compiler, Vec forms contained a ty field, not a ty option field. Therefore, when processing a Vec form, signal an error if the ty option field is None.

4. Remove complex operands (remove_complex.ml)

There are two new forms that need to be handled here: FunRef and Apply. Both are classified as complex expressions (as explained in section 7.6 of the textbook). On the other hand, all the arguments of Apply must be atomic expressions.

5. Explicate control (explicate_control.ml)

In this pass, the input language is lfun_ref_mon and the output language is cfun. The cfun language extends the ctup language of assignment 5 by adding several new forms:

  • expressions: FunRef and Call
  • statements: CallS
  • tails: TailCall

In addition, there is the fun_contents record type which contains all information for a function (all types, argument names, local variable names, and the function body).

The changes that need to be made in this pass are mostly trivial. Non-trivial or non-obvious changes include:

  • convert_exp: Apply becomes a Call.

  • explicate_pred: FunRef are not handled since they can't be booleans. The Apply case is handled like a VecRef; implement this conversion:

    (if (apply f x1 x2 ...) ...)
    
    ; -->
    
    (let (apply_N (apply f x1 x2)) ;  `apply_N` is a new name
      (if apply_N ...))
    
  • explicate_effect: Apply becomes a CallS; otherwise, this case is handled like other cases.

  • explicate_tail: Apply becomes a TailCall.

  • create_block: add a case for TailCall.

6. Select instructions (select_instructions.ml)

In this pass, the input language is cfun and the output language is x86_var_def. There are a number of significant changes and additions in this pass. (See section 7.8 in the textbook for a detailed walkthrough.)

For a function named FUNC, start blocks become FUNC_start, and conclusion blocks become FUNC_conclusion. This is needed because there are multiple functions, each with its own start and conclusion block.

Also, before the first instruction in the FUNC_start block, the function arguments must be moved from their designated registers (the argument passing registers: %rdi for argument 1, %rsi for argument 2, etc.) into the function argument names. We do this in the add_arg_instrs function.

A number of new forms need to be handled, which we will illustrate schematically.

  1. Assigning a FunRef to a variable:

    var = (FunRef f n)
    -->
    leaq f(%rip), var  ; use GlobalArg for f(%rip)
    
  2. Assigning a function call to a variable:

    var = (Call atm atms)
    -->
    <movq instructions from atms to registers>
    callq *atm   ; IndirectCallq
    movq %rax, var
    
  3. A non-returning function call (CallS): same as the previous case without the final movq instruction.

  4. Returning a FunRef:

    return (FunRef f n)
    -->
    leaq f(%rip), %rax  ; use GlobalArg for f(%rip)
    
  5. Returning a function call should be signalled as an error, since this should have been converted to a TailCall previously.

  6. A TailCall:

    (TailCall atm atms)
    -->
    <movq instructions from atms to registers>
    jmp *atm   ; TailJmp
    

Don't forget to use fix_label on function names when outputting Callq or GlobalArg instructions.

7. Uncover live (uncover_live.ml)

This pass is the same as the previous compiler, except for the new instructions:

  • Leaq writes to the destination register. The label is irrelevant to liveness analysis.

  • IndirectCallq and TailJmp are like Callq except that they also read from the argument (register) that stores the function address.

8. Build interference (build_interference.ml)

In this compiler, we build an interference graph for each function.

Other than that, the only significant change to this pass is in the handling of Callq and IndirectCallq instructions. In the previous compiler, we ensured that call-live tuple-typed variables were not stored in registers during a garbage collection (so they can be stored in the root stack). We did this by adding interference edges between such variables and all registers (caller-saved and callee-saved). Note that calls to built-in functions other than collect do not have this requirement.

This is still true, but in addition, we need to add these interference edges for tuple-typed variables that are call-live during calls to user-defined functions, which use IndirectCallq. (We never use Callq for user-defined functions, only for built-in functions.)

The reason we do this is that we want to ensure that tuple-typed variables are on the root stack during every garbage collection. A call to collect triggers a garbage collection, and a call to a user-defined function may trigger a garbage collection. To be conservative, we assume that it will.

Note

A more sophisticated compiler could analyze functions to see if it could prove that no garbage collection was possible when calling that function; if so, it would not have to create interference edges between call-live tuple-typed variables and all registers when calling that function.

9. Remove jumps (remove_jumps.ml)

Since there are multiple "conclusion" blocks now (one for each function), you can't just remove the block labelled conclusion when considering which blocks to merge; you have to remove the block labelled NAME_conclusion where NAME is the name of the function. Other than this, the pass is basically unchanged.

10. Patch instructions (patch_instructions.ml)

Aside from obvious changes, the only things you need to make sure of in this pass are:

  • Leaq instruction: the second (destination) argument must be a register.

  • TailJmp instruction: the source register should be %rax. The reason for this is discussed in the textbook (section 7.2.2).

11. Prelude and conclusion (prelude_conclusion.ml)

This pass is very well described in the textbook (section 7.11), so we refer you to that. Be sure not to forget to add the conclusion code before each tail call. Recall that tail calls are indirect jump (TailJmp) instructions.

Also, unlike in the textbook, but like our previous compilers, this pass doesn't generate raw assembly code but abstract assembly code (see x86_asm.mli). This will turn out to be very handy in the final optimization pass, which we turn to now.

E. Peephole optimizations: the "optimize" pass (optimize.ml)

This pass isn't in the textbook. It implements various peephole optimizations.

A "peephole optimization" is an optimization that can be done on a small number of nearby (usually adjacent) instructions. The name "peephole" suggests that you are looking at the list of instructions through a narrow peephole so you can only see a few at one time. Despite this, there are many cases in which you can replace or remove some of these instructions to get equivalent ones that do the same thing. So the idea is to

  • remove instructions that aren't needed (reducing code size)
  • replace instructions with other instructions that are more efficient

In this pass, you will be working with the raw assembly language instructions generated by the "prelude and conclusion" pass. These instructions are represented in the x86_asm language, which is just raw x86-64 instructions represented as an OCaml datatype.

Although there are many possible peephole optimizations, we will only be doing two kinds.

Removing instructions with no effect

Some instructions can't possibly have any effect, and can therefore be removed. We remove the following instructions:

  1. Adding 0 to an integer-valued register/stack location.

    addq $0, %rcx   ; removed!
    addq %rcx, $0   ; removed!
    
  2. Subtracting 0 from an integer-valued register/stack location.

    subq $0, %rcx    ; removed!
    

    Note that subq %rcx, $0 can't be removed, since it negates the contents of the %rcx register.

  3. Self-moves: moving a register/stack location to itself.

    movq %rcx, %rcx  ; removed!
    

This last case has been handled earlier, so it will probably not be necessary, but it's trivial to implement, so we do it.

Trimming reciprocal moves and identical moves

In addition to self-moves, sometimes you have a pair of instructions of the form

movq X, Y
movq Y, X   ; removed!

It's clear that the second instruction can't have any effect, since after the first instruction, the two locations have the same value. So we can remove the second instruction. (We still have to keep the first instruction.)

We also remove consecutive identical moves:

movq X, Y
movq X, Y   ; removed!

This should work no matter how many consecutive identical moves there are.

Additional optimizations

If you think of extra peephole optimizations that could be implemented, please let us know! But please don't implement them in your compiler, or your output will differ from the course compiler's output.