Today is the 70th birthday of Donald Knuth.

If you don’t know who Knuth is, then you’re not a programmer. If you’re a programmer and you don’t know who Knuth is, well… I have no idea what rock you’ve been hiding under, but you should probably be fired.

Knuth is one of the most famous and accomplished people in the computer science community. He’s done all sorts of great research, published a set of definitive textbooks on computer algorithms, and of particular interest to me, implemented a brilliant, hideous, beautiful, godawful piece of software called TeX.

When I went to grad school, instead of being a teaching assistant, I worked for the department as a sysadmin. I ended up spending close to six years doing almost nothing but technical support for TeX, which explains my mixed emotions about it in the description above.

So what is TeX?

When Knuth was writing his papers and books, one of the problems that he constantly encountered was having the typesetters screw up the math. Mathematicians use a lot of wierd symbols, and the people who typeset the books mostly didn’t know math. So they’d mess up all sorts of things – sometimes in very serious ways. So after he’d gotten sufficiently sick of dealing with that, he sat down and wrote a typesetting programming – which became TeX.

TeX was one of the first major markup languages. The idea was that you’d write the text of

your document, and mixed into the text, you’d include commands that described how you’d like

things to be typeset. Then you’d run the whole shebang through a TeX processor, and it would

generate the perfectly typeset results. TeX rapidly became *the* way that technical

papers were written, and it remains the dominant typesetting system for technical work all the

way to the present day. TeX is a brilliant system. There’s a darned good reason why it remains

so dominant 30 years after Knuth wrote the original version.

TeX is more than just a typesetting system. It’s a full-fledged programming language. It is absolutely turing complete – as a proof of that, a lunatic by the name of Andrew Greene

wrote a complete, usable BASIC interpreter in TeX!. It’s arguably insane to have a Turing complete programming language for a task like typesetting. But it actually makes sense. As I’ve pointed out before, it’s actually very easy for a programming system to be Turing complete. Basically, if you can do iteration and arithmetic, and you’ve got no absolute limit on storage, you’re pretty much guaranteed to be Turing complete. And attractive typesetting needs to do things like iteration – to lay out the pieces of stuff, and it needs to be able to do arithmetic, because it’s got to

be able to compute the positions on the page of the different typesetting elements. Since Knuth gave it string-valued variables, without placing a limit on how much stuff you could put into a string, it was pretty much inevitable that it would be Turing complete.

There are two fundamental ideas behind TeX. As a language, it’s based on macro expansion. You can define symbols, and describe replacements for those symbols. When a symbol is encountered, it’s replaced by its value. (usually…)

For example, you could write a TeX macro that replaced a piece of text with

two copies, and then use it like the following:

\def\double#1{#1 #1} \double{Foo}

That would create, as its result, the text “Foo Foo” typeset on a page.

There’s a ton of control structure oriented towards managing just when things

do their macro-expansion. So, for example, you can alter the order in which

things do expansions using something called “expandafter”. Expandafter takes the next

two syntactic elements, pushes the first onto stack, does the expansion of the second, replacing it with its expansion, and then doing the expansion of the first, allowing it

to use part of the expansion of the second as its parameter. Here’s an example

from the basic interpreter:

\def\strlen#1{\strtmp-2% don't count " " \iw tokens \expandafter\if\stringP #1\let\next\strIter\strIter #1\iw\fi}

This basically says “Expand \stringP, and then do the if”. Since stringP takes a parameter,

that means, roughly, compute “\stringP #1″ (where #1 is a parameter to “strlen”), and

then use the result of that as the first parameter to the “if”. So this says “If #1 is a string, then compute its length using strIter”.

Just looking at this tiny fragment, you should be able to see that TeX could make

for a prize-winning entry as a pathological language.

The other main idea of TeX is the boxes-and-glue model of typesetting. All of that crazy macro stuff generally ends up by producing two kinds of things: *Boxes*, which are

things can be drawn on a page, and *glue*, which is invisible stretchy stuff that sticks boxes together. This is the part of TeX that is amazingly, gloriously, magnificently brilliant. It’s an extremely simple model which is capable of doing extremely complex things.

The idea of typesetting in TeX is that you go through the document, expanding the macros, which results in a ton of boxes and glue. The boxes have all different sizes, and you want to put them together to produce something attractive. Glue makes that work. Glue attached boxes together: it defines how the boxes will be joined (should they line up their centers? Should they be aligned by some guideline?), and it defines how big the space between them should be, and how much it can be stretched or compressed. Page layout is really just tension

relaxation: find the arrangement of boxes which produces the smallest overall tension, within the constraints imposed by the glue.

The result of that varies, depending on the skill of the person who set the basic constraints used to determine how the basic glue tensions worked. You can create astonishingly beautifully set text – a skilled typesetter can work out the constraints to produce a result that’s the aesthetic equal of the very best typesetter. You can also create truly astonishingly godawful stuff, on a par with some of what we’ve seen on the web.

But on the whole, it’s been a great thing. Pick up *any* conference proceedings

from the last 20 years, in the fields of math, computer science, physics, or chemistry (among numerous others), and you’ll see the results of TeX layout. Pick up a book published by Springer-Verlag, and it’s almost certainly typeset by TeX. Look at Greg Chaitin’s books – every one was written using TeX. Look at any typeset equation in pretty much any published source, from websites to conference proceedings, to journals, to textbooks. If the equation looks really good, if everything is in exactly the right place, and every symbol is correctly drawn in relation to everything else – odds are, it was generated by TeX. Even hardcore Microsoft word users generally use something TeX based for doing equations.

I’ve got a love-hate relationship with TeX. It’s a tough system to master, and it’s

amazing how badly many people misuse it. So as a guy who had to work doing technical support

of a bunch of people who didn’t understand it, but were using it to write their papers and dissertations, I dealt with more than my fair share of frustration caused by some of the wierd things that TeX can do. But looking at it as an engineer, and looking at it as a user myself, and realizing when it was written, I have to conclude that it’s one of the best pieces of software ever written. I don’t know of *any* other software other than TeX implemented in the 1970s that remains absolutely and unquestionably dominant in its domain. And the glue-and-boxes model of text layout was a piece of absolute genius – one of the most masterful examples of capturing an extremely complex problem using an extremely simple model. It’s beautiful. And it’s typical of the kind of thing that Knuth does.

Happy Birthday, Dr. Knuth, and many happy returns!