Skip to content

On S-expressions

What S-expressions are

S-expressions are described in this Wikipedia article. Essentially, they are an extremely simple way (perhaps the most simple way) to represent structured data. The "S" in "S-expression" originally meant "symbolic"; these were originally the way to represent arbitrary data in the Lisp programming language. (They were also used to represent Lisp code as well, but that's another story.) Even now, S-expressions are often associated with Lisp-like programming languages such as Scheme, Clojure, and Racket, but they are more generally useful, as we'll see.

An S-expression is essentially just a nested list of arbitrary symbols. S-expressions can be described recursively as either

  • a symbol (a string without quotes which doesn't have internal spaces, quotes, or parentheses)
  • a list of S-expressions

Usually, we use parentheses to delimit a list. Here are some sample S-expressions:

  • foo
  • (a b c)
  • (this is (a (nested list) of) symbols)
  • ((foo 1) (bar 2) (baz 3))

Any kind of structured data can be represented as an S-expression.

How we use them

We use S-expressions as both a serialization format and a visualization tool. "Serialization" means that we can take arbitrary OCaml datatypes and convert them to and from S-expressions without losing any information. For visualization, we take some OCaml datatype we want to look at, convert it to an S-expression, and then pretty-print the S-expression in a readable format.

Visualization is incredibly useful for debugging. We will be using a lot of fairly complex OCaml datatypes, and our compiler passes will have the job of converting one datatype into another. We would like to be able to inspect these datatypes to make sure that we converted them correctly. If we had to write special string conversion functions for all our datatypes, that would be an enormous amount of boring work. Instead, there are OCaml libraries that will allow us to automatically convert any datatype to an S-expression if we add small annotations to our code. (These libraries are the sexplib and ppx_sexp_conv libraries.)

In order to get OCaml to generate the S-expression conversion code, we have to add an annotation after type declarations:

type value =
  | Bool of bool
  | Int of int
  | Function of (value -> value)
[@@deriving sexp]   (* this is the annotation *)

The [@@deriving sexp] line is what is called a PPX extension; it's a kind of code-generating macro which, when interpreted by OCaml, will generate two functions: sexp_of_value (convert a value to an S-expression) and value_of_sexp (convert an S-expression to a value). We tend to use the first function(s) much more than the second, because we usually want to convert our types to S-expressions.

We will use this facility for nearly all our data structures. Fortunately, you don't have to worry about it! Just don't remove those annotations; they are doing real work.

All you really need to know is that for any datatype foo which has the [@@deriving sexp] annotation, (which is almost all of them) there will be functions called sexp_of_foo and foo_of_sexp generated. There is also a function in the Support.Utils module called print_sexp which will print out an S-expression in a readable format. So if you need to print out the data structure, just convert it to an S-expression with (say) sexp_of_foo and then print it using print_sexp.

Note

There is much, much more that could be said about OCaml's PPX extension system, which is a relatively new feature of the language (and which wasn't discussed at all in CS 4 or CS 131). It's a very powerful code-generation system, but it also has a steep learning curve.