Functional programming tips

Use functional style by default

Code should be written in a functional style (no refs, arrays, or records with mutable fields) except where indicated. Marks will be taken off if this rule is violated. We will let you know when it's OK to use imperative idioms. Certain imperative idioms (like while and for loops) are almost never used, but we do use ref cells on occasion to make code simpler.

Many of the tips below describe ways to effectively use functional style. They are not in any particular order, so feel free to browse until you find something interesting.

Use tail recursion by default

As you recall, a lot of CS 4 was spent teaching you to write functions using recursion instead of loops, and specifically using tail recursion. We use tail recursion because it has desirable space properties, so in general, when you write a recursive function, we generally try to make it tail recursive. In particular, any function that can operate on arbitrarily-sized data structures should really be tail-recursive so as not to use up excessive amounts of stack space.

However, there is no need to be obsessive about this. Most of the code compiled by a compiler will be comparatively short (e.g. the size of a typical function), so if the input to a function is that code, optimizing for space efficiency may not buy you very much. So if it's significantly easier to write a particular recursive function in a non-tail-recursive style, it's OK to do that. But be prepared to defend this choice in code reviews.

Also note that it's often desirable to use a fold (see below) instead of using recursion at all!

Persistent data structures

Probably the hardest part of using functional programming effectively is getting used to persistent data structures (like lists) instead of mutable data structures (like arrays).

In this course, almost all data structures are persistent, which means that when you change something in the data structure, it returns the updated version without altering the input version. This is very different to how imperative languages work.

Tip

If you think you need to use List.iter (which would be appropriate for a mutable data structure) you almost always want to actually use List.fold_left or an iterative helper function (which is appropriate for a persistent data structure).

Even though you may find functional programming awkward at first, it is vastly easier to write correct code in a functional style than in an imperative style. There is simply less that can go wrong.

Maps are your friend!

Consider this tail-recursive function:

let rec convert_nums (scale : int) (ns : int list) (acc : int list) : int list =
  match ns with
    | [] -> acc
    | h :: t ->
        convert_nums scale t (acc @ [scale * h])

This function would be called like this:

# convert_nums 10 [1;2;3;4;5] []
- : int list = [10; 20; 30; 40; 50]

When you look at what the function is actually doing, it becomes clear that it is simply multiplying each element of the list by 10. A much simpler way to achieve this would be to use a map:

# List.map (fun n -> 10 * n) [1;2;3;4;5]
- : int list = [10; 20; 30; 40; 50]

You should develop your intuition for where a recursive function can be replaced by a higher-order function, and List.map is a simple example of this.

Note on efficiency

The other thing wrong about this function is that it's gratuitously inefficient, because of the line

    convert_nums scale t (acc @ [scale * h])

The acc @ [scale * h] expression adds a new value scale * h to the end of a list, which is an \(O(n)\) operation, giving the entire function an \(O(n^2)\) time complexity. A better approach would be to add to the front of the list and reverse it at the end.

Now consider this (non tail-recursive) function:

let rec spread_nums (ns : int list) : int list =
  match ns with
    | [] -> []
    | h :: t -> [h - 1; h; h + 1] @ spread_nums t

It could be used like this:

# spread_nums [1;2;3];;
- : int list = [0; 1; 2; 1; 2; 3; 2; 3; 4]

You might think you can't use a map here:

# List.map (fun n -> [n - 1; n; n + 1]) [1;2;3];;
- : int list list = [[0; 1; 2]; [1; 2; 3]; [2; 3; 4]]

However, you can combine the list of lists into a list using the List.concat function:

# List.concat (List.map (fun n -> [n - 1; n; n + 1]) [1;2;3]);;
- : int list list = [0; 1; 2; 1; 2; 3; 2; 3; 4]

This is so common that there is a function called List.concat_map which does both the mapping and the combining:

# List.concat_map (fun n -> [n - 1; n; n + 1]) [1;2;3];;
- : int list list = [0; 1; 2; 1; 2; 3; 2; 3; 4]

So again there is no need for a recursive function when all the function is doing is mapping a function over a list of values and flattening the resulting list of lists; this is just the same as using List.concat_map.

Folds are your friend!

Programmers new to functional programming are often frustrated by what they perceive as the difficulty of doing simple things. A good example of this is accumulation. You have a list and want to compute some value from the list elements. For instance, say you want to get the maximum value of the list (and assume that you only have a two-element max function to compute maximums). A Python programmer could immediately write this code:

def max_list(lst):
    """Compute the maximum of a list of positive integers."""
    if len(lst) == 0:
        return 0
    mx = lst[0]
    for item in lst:
        mx = max(mx, item)
    return mx

In OCaml, we do this sort of thing using folds, specifically left folds.

let max_list lst =
  List.fold_left
    (fun mx item -> max mx item)
    0
    lst

This can be simplified further:

let max_list lst = List.fold_left max 0 lst

Whenever you need to write code which accumulates something over a list, you generally want to use a fold.

A common beginner's style error is to do this with an iterative helper function:

let max_list lst =
  let rec iter rest mx =
    match rest with
      | [] -> mx
      | h :: t -> iter t (max h mx)
  in
    iter lst 0

This is not so bad, but the iterative helper function can be replaced by List.fold_left and the entire function becomes a one-liner! Be alert for this kind of situation -- there's no benefit to using a functional language if you don't take advantage of what it offers.

On the other hand, sometimes you are accumulating more than one thing in an iteration, or you may be passing more than just the thing you are accumulating through each iteration of the computation. In these cases, you can still use List.fold_left by creating a tuple of everything that needs to be passed from one iteration to another and using that as the accumulation value, but in my experience, it's often easier to just write an iterative helper function.

Left vs. right folds

95% of the time or more, if you want to use a fold in OCaml, you want to use a left fold (List.fold_left) instead of a right fold (List.fold_right). Left folds are tail recursive (space efficient) and are more natural than right folds, which are not tail recursive and thus not space efficient.

On the other hand, there are cases where right folds are the right thing to use. When you want to accumulate things in a list from the right going back to the left, a right fold is usually going to be simpler than a left fold, and you shouldn't feel bad about using one in that case (unless you expect that the list could be very long, in which case you don't want to use a right fold because it would use up too much stack space).

Nested pattern matching

It's very common to want to write a pattern match inside another pattern match. There are two cases to consider.

Where the two pattern matches can be combined. Consider this code:
```
let f (maybe_lst : int list option) : int option =
  match maybe_lst with
    | None -> None
    | Some lst ->
      begin
        match lst with
          | h :: _ -> Some h
          | [] -> None
      end
```
You can merge both pattern matches as follows:
```
let f (maybe_lst : int list option) : int option =
  match maybe_lst with
    | Some (h :: _) -> Some h
    | _ -> None
```
Note how much shorter and clearer the code is. Also note the use of the wildcard (_) pattern for "don't care" situations.

Merging pattern matches is almost always possible if the inner pattern match doesn't depend on a value computed using information from the outer pattern match. Unnecessary pattern matches are bad style and will be marked down (but we'll try to alert you of this issue during code reviews).
Where the two pattern matches can't be combined. Consider this code:
```
let f (lst : int list) : (int * int) option =
  match lst with
    | h :: t ->
      begin
        match List.sort compare t with
          | h' :: _ -> Some (h, h')
          | _ -> None
      end
    | _ -> None
```
The inner match is matching on the result of the List.sort function applied to the tail of the list. You can't combine these pattern matches because the inner one depends on a value which must be computed (it's not structural). For very simple cases, you can use a when clause instead of the inner match, but often this isn't feasible. So you have nested match expressions.

When you have nested match expressions, you must surround them by either begin/end delimiters (as we did above) or parentheses, which looks like this:
```
let f (lst : int list) : (int * int) option =
  match lst with
    | h :: t ->
      (match List.sort compare t with
         | h' :: _ -> Some (h, h')
         | _ -> None)
    | _ -> None
```
Using parentheses looks more concise, but it's easy to forget to close the open parenthesis, so we prefer to use begin/end. (It reads better, too.) What you definitely do not want to do is this:
```
let f (lst : int list) : (int * int) option =
  match lst with
    | h :: t ->
      match List.sort compare t with
        | h' :: _ -> Some (h, h')
        | _ -> None
    | _ -> None
```
If you do this, you will get strange error messages because OCaml will consider the last line to be part of the inner pattern match. Remember, OCaml is not whitespace sensitive! So don't forget the begin/end.