Books I like: SICP

July 03, 2023

One of my favorite books on programming is Structure and Interpretation of Computer Programs by Hal Abelson, Gerald Jay Sussman and Julie Sussman. This book is usually referred to by its initials (SICP, pronounced “sick pea”). It’s old; the first edition came out in 1984 and the second in 1996. It’s also one of the great classics of computer science. (You can read it online in its entirety here.)

It’s hard to summarize the book, because it covers so much ground, but here goes.

Background

SICP was originally intended as part of the process of revamping the introductory computer programming courses at MIT in the 1980s. As I understand it, prior to this the introductory courses used Pascal, which is a simple language but one that is quite limited in its abstractive power.

MIT was then a hotbed of artificial intelligence research, most of which involved symbolic programming using the Lisp programming language. Gerry Sussman had been heavily involved in this, and had (with Guy Steele), designed a streamlined dialect of Lisp called Scheme. Together, they wrote a series of very influential papers on Scheme called the Lambda The Ultimate papers, because most of the titles had the form of “Lambda, the Ultimate X” (for some X). For instance, there was “Lambda, the Ultimate Imperative”, “Lambda, the Ultimate Declarative”, and so on. The gist of these papers was to show that a programming language that supported first-class functions (known as “lambdas”, from Lambda Calculus) could be used to implement a number of seemingly-unrelated programming constructs such as unconditional jumps (“GOTO” statements)¹. The intent of Scheme was thus reductive: the language consisted of a small number of very powerful features which could be used to implement many other features, so that those other features didn’t have to be provided by the language itself. This is illustrated by the opening sentence of the Scheme standard report:

Programming languages should be designed not by piling feature on top of feature, but by removing the weaknesses and restrictions that make additional features appear necessary. Scheme demonstrates that a very small number of rules for forming expressions, with no restrictions on how they are composed, suffice to form a practical and efficient programming language that is flexible enough to support most of the major programming paradigms in use today.

Scheme continued to evolve, and adopted more powerful features, such as hygienic macros and continuations. With first-class functions (lambdas), macros and continuations, you could implement features like object-oriented programming, exception handling, coroutines, etc. right in the language without having to have them “baked in”.²

Using Scheme

With this background, and tasked with the project of reforming computer science education at MIT, it’s not surprising that Abelson and Sussman decided to use Scheme as the basis for their introductory textbook. To say that this was an ambitious approach is putting it mildly. The entire book seems to be dedicated to throwing programmers as far out of their comfort zone as possible.

SICP uses Scheme for all the programming examples (though they refer to it as “Lisp”). They don’t provide a tutorial on the language; they just use it and expect that you can pick it up as you go along. (Sometimes they will briefly describe a new language feature, emphasis on “briefly”.) This works because Scheme has an incredibly minimal syntax (basically just identifiers, literals and parentheses), but it’s also very disorienting for programmers used to more conventional languages.

Structure

They use the first two chapters to introduce functional programming, without referring to it as such (at least, not until later in the book). This means that a programmer used to conventional languages will have to relearn programming from scratch. Instead of loops, you have tail-recursive helper functions. Instead of variables, you have extra arguments to tail-recursive helper functions. Once you wrap your head around this, it’s not hard (like riding a bike), but it’s very different.

Soon they’ve moved on to higher-order functions (functions that take functions as arguments, or return functions). This is where the functional approach starts to really pay off, because you can often reduce very complex computations into a very small amount of code. They also introduce generic functions, which is an extremely powerful technique rarely seen in modern programming languages.³ They also introduce object-oriented programming (which they call “data-directed programming”) in chapters 2 and 3. The dialect of Scheme they use has no syntactic support for OOP; you just use regular Scheme functions (as closures, which are functions that contain private state) to represent objects.

In chapter 3 they (finally!) cover imperative programming (you know, what you would do in day 1 of a conventional programming course). They call imperative update of variables “mutation”, I guess to emphasize the dangerous aspects of it. By this point, you’ve already been programming without mutation for so long that introducing it back doesn’t seem like such a big deal, but they give examples of how mutation can achieve effects difficult to accomplish without it. They also (briefly) cover concurrency and streams (lazy lists). (The stream section (section 3.5) blew my mind when I first read it; maybe you’ll have the same reaction.)

Oh, and by the way, by this point in the book they’ve introduced two (2) different abstract models to explain the process of evaluating expressions in Scheme: the substitution model (chapter 1), which is basically like high school algebra, and the environment model (chapter 3), which is how languages actually work.

Interpretation

By this point, you might think that the book was done; an awful lot of deep and probably unfamiliar ideas have already been introduced. But no! This only ends the “structure” part of the book. The “interpretation” part takes up the last two chapters.

In chapter 4 you write a Scheme interpreter inside of Scheme. This is called a “metacircular” interpreter – a language “implemented in itself”. Writing a programming language interpreter is already a huge step (I don’t get to this until my CS 131 course, which is an advanced course). And writing it inside the same language that you are interpreting is a whole other level of mind-bending. However, it’s also very convenient, since you can borrow whatever features you need from the implementing language into the implemented language. This keeps the implementation simple, which in turn allows you to implement features usually thought of as complex (like first-class functions AKA closures) without too much trouble.

After that, they show you how you can tweak the interpreter by making comparatively small changes that convert it into an interpreter for a completely different language. They do this three times: once for a lazy dialect of Scheme, once for a nondeterministic dialect, and once for a logic programming language. So by this time you’ve implemented four (4) different programming languages in Scheme.

Since an interpreter written in itself is going to be pretty slow, in chapter 5 they describe how to write a simple compiler for Scheme. (Where else has an introductory programming book ever morphed into a compiler textbook? Nowhere, that’s where.) They cover register machines, stacks, garbage collection, and so on. They don’t just describe these things; you implement all of them. You keep going, making the compiler work at progressively lower levels of abstraction, until you get to the very last problem in the book:

Exercise 5.52. As a counterpoint to exercise 5.51, modify the compiler so that it compiles Scheme procedures into sequences of C instructions. Compile the metacircular evaluator of section 4.1 to produce a Scheme interpreter written in C.

So the final “exercise” is to generate a Scheme interpreter in C. Not write a Scheme interpreter in C, generate it, as in “write a program which generates a program”. Think about that for a little while.

I’ve never gotten that far. I suspect that only a handful of people who have read SICP have ever done that exercise. (One day, I will.) But you have to admire the absolute chutzpah of anyone who would write an ostensibly “introductory” book on programming and end it with an exercise (like any other exercise) telling you to write a program to generate a Scheme interpreter in C.

And this, to me, is the glory of SICP. This is a book on programming, written by expert programmers, which absolutely, categorically, uncompromisingly refuses to dumb itself down in any way, shape or form. The authors don’t pander to you. They assume you’re as intelligent as they are, and can follow any valid logical argument they make. Then they show you all the cool stuff they know.

When I first read SICP, I remember that I would go through about 20 pages of very densely-written material and then have to take a break. My brain needed time to cool down and absorb what it had just seen. When I had done so, I often got the “a-ha” sense of enlightenment that many people have described as a consequence of learning a Lisp dialect. And the book is loaded with little profound asides, often in comments. One of my favorite such comments comes at the end of chapter 3:

The object model approximates the world by dividing it into separate pieces. The functional model does not modularize along object boundaries. The object model is useful when the unshared state of the “objects” is much larger than the state that they share. An example of a place where the object viewpoint fails is quantum mechanics, where thinking of things as individual particles leads to paradoxes and confusions. Unifying the object view with the functional view may have little to do with programming, but rather with fundamental epistemological issues.

The first three sentences describe what is usually referred to as the expression problem, which is a deep enough concept by itself. Then they connect this to quantum mechanics. (Not a leap I could have made, but fascinating nonetheless.)

I’m still trying to fully grasp the last sentence.

So yes, SICP makes you think. Hard. If you like thinking hard, it’s your book.

SICP and me

I could end here, but I can’t resist sharing a couple of stories.

I’ve heard a number of computer scientists say something to the effect that “SICP is the reason I’m a computer scientist”. Well, for me the connection is probably more direct than most.

After I got my Ph.D. (not in computer science, as it happens, but in Computation and Neural Systems), I interviewed for my current job. This was a pretty intense process. I had long one-on-one conversations with several Caltech CS faculty members. Of all these meetings, one stood out. I met with Andre Dehon, who was a new faculty member charged with (wait for it) revamping the CS 1 course at Caltech. Andre had been an undergraduate at MIT, and so he had taken the course based on SICP (the now-legendary “6.001” course). He wanted to use SICP for Caltech’s CS 1 as well.

When I got to Andre’s office, he initially didn’t pay much attention to me. He had an expression on his face that I would loosely translate as “oh no, I have to talk to yet another one of these idiots about a job they can’t possibly do”. Glancing at his desk, I saw a copy of SICP, which I’d already read by then. So I casually said “Are you going to be using SICP for your introductory class?”

That woke him up.

He gave me a penetrating look and said “You KNOW this book?”

I said “Sure, I’ve read it. It’s a great book! How far do you think you’ll be able to get in an intro class?”

Then his face broke out into a big smile, and he said that he’d love to be able to get to the metacircular evaluator, but he doubted he could get that far.

I’m pretty sure I knew I had the job then.

And I ended up working with Andre on the first version of that course, which I eventually ended up teaching by myself. Then some years later a seismic shift happened in the CS department, and we switched from Scheme to Python in CS 1. (MIT went through the same transition independently.) But since I loved the SICP material so much, I created CS 4, which still uses SICP (though I use OCaml as the language instead of Scheme, for reasons).

The last story was when I met Gerry Sussman (and his wife Julie).

I’d been taking piano lessons with Jim Boyk at Caltech. In addition to being an incredible pianist, Jim is an extremely intelligent person who thinks deeply about everything. And it turns out that Jim knew Gerry, and they had collaborated on a project some years back. So Gerry came to Caltech after I’d been teaching for a couple of years, I think to visit Jim, though he probably had other things going on as well. Julie was with him too. And so I got to meet them, and (of course) I got them to sign my copy of SICP. They were just the nicest and most interesting people you could ever hope to meet. I remember Gerry had a pin that said “Nerd Pride”. We all went out to dinner at the Athenaeum, and I tried to make a dumb programming joke about the “prix-fixe” menu (since Scheme uses a “prefix” syntax). It went up like a lead balloon, but I thought it was pretty clever. Good times.

This also requires tail-call optimization (TCO), which is dealt with at length in the book. ↩︎
It’s not quite as simple as this makes it seem. By implementing complex features this way, you can lose some benefits of having them hard-wired into the language. Notably, error reporting will usually be much better in a hard-wired system. ↩︎
The language Julia, which is heavily influenced by Lisp and Scheme, is based around generic functions. ↩︎