I’ve got a real treat for you pathological programming fans!
Today, we’re going to take a quick look at the worlds most *useful* pathological programming language: TECO.
TECO is one of the most influential pieces of software ever written. If, by chance, you’ve ever heard of a little editor called “emacs”; well, that was originally a set of editor macros for TECO (EMACS = Editor MACroS).
As a language, it’s both wonderful and awful. On the good side, The central concept of the language is wonderful: it’s a powerful language for processing text, which works by basically repeatedly finding text that matches some kind of pattern, taking some kind of action when it finds it, and then selecting the next pattern to look for. That’s a very natural, easy to understand way of writing programs to do text processing. On the bad side, it’s got the most god-awful hideous syntax ever imagined.
History
———
TECO deserves a discussion of its history – it’s history is basically the history of how programmers’ editors developed. This is a *very* short version of it, but it’s good enough for this post.
In the early days, PDP computers used a paper tape for entering programs. (Mainframes mostly used punched cards; minis like the PDPs used paper tape). The big problem with paper tape is that if there’s an error, you need to either create a *whole new tape* containing the correction, or carefully cut and splice the tape together with new segments to create a new tape (and splicing was *very* error prone).
This was bad. And so, TECO was born. TECO was the “Tape Editor and COrrector”. It was a turing complete programming language in which you could write programs to make your corrections. So you’d feed the TECO program in to the computer first, and then feed the original tape (with errors) into the machine; the TECO program would do the edits you specified, and then you’d feed the program to the compiler. It needed to be Turing complete, because you were *writing a program* to find the stuff that needed to be changed.
A language designed to live in the paper-tape world had to have some major constraints. First, paper tape is *slow*. *Really* slow. And punching tape is a miserable process. So you *really* wanted to keep things as short as possible. So the syntax of TECO is, to put it mildly, absolutely mind-boggling. *Every* character is a command. And I don’t mean “every punctuation character”, or “every letter”. *Every* character is a command. Letters, numbers, punctuation, line feeds, control characters… Everything.
But despite the utterly cryptic nature of it, it was good. It was *very* good. So when people started to use interactive teletypes (at 110 baud), they *still* wanted to use TECO. And so it evolved. But that basic tape-based syntax remained.
When screen-addressable terminals came along – vt52s and such – suddenly, you could write programs that used cursor control! The idea of a full-screen editor came along. Of course, TECO lovers wanted their full screen editor to be TECO. For Vaxes, one of the very first full screen editors was a version of TECO that displayed a screen full of text, and did commands as you typed them; and for commands that actually needed extra input (like search), it used a mode-line on the bottom of the screen (exactly the way that emacs does now).
Not too long after that, Richard Stallman and James Gosling wrote emacs – the editor macros for TECO. Originally, it was nothing but editor macros for TECO to make the full screen editor easier to use. But eventually, they rewrote it from scratch, to be Lisp based. And not long after that, TECO faded away, only to be remembered by a bunch of aging geeks. The syntax of TECO killed it; the simple fact is, if you have an alternative to the mind-boggling hideousness that is TECO syntax, you’re willing to put up with a less powerful language that you can actually *read*. So almost everyone would rather write their programs in Emacs lisp than in TECO, even if TECO *was* the better language.
The shame of TECO’s death is that it was actually a really nice programming language. To this day, I still come across things that I need to do that are better suited to TECO than to any modern day programming language that I know. The problem though, and the reason that it’s disappeared so thoroughly, is that the *syntax* of TECO is so mind-bogglingly awful that no one, not even someone as insane as I am, would try to write code in it when there are other options available.
A Taste of TECO Programming
——————————
Before jumping in in and explaining the basics of TECO in some detail, let’s take a quick look at a really simple TECO program. This program is absolutely *remarkably* clear and readable for TECO source. It even uses a trick to allow it to do comments. The things that look like comments are actually “goto” targets.
0uz ! clear repeat flag ! <j 0aua l ! load 1st char into register A ! <0aub ! load 1st char of next line into B ! qa-qb"g xa k -l ga -1uz ' ! if A>B, switch lines and set flag ! qbua ! load B into A ! l .-z;> ! loop back if another line in buffer ! qz;> ! repeat if a switch was made last pass !
The basic idea of TECO programming is pretty simple: search for something that matches some kind of pattern; perform some kind of edit operation on the location you found; and then choose new search to find the next thing to do. The example program above finds the beginnings of lines, and does a swap-sort. So it finds each sequential pair of lines; if they’re not in the right order, it swaps them, and sets a flag indicating that another pass is needed.
TECO programs work by editing text in a buffer. Every buffer has a *pointer* which represents the location where any edit operations will be performed. The cursor always sits *between* two characters.
The first thing that most TECO programs do is specify what it is that they want to edit – that is, what they want to read into the buffer. The command to do that is “ER”. So to edit a file foo.txt, you’d write “ERfoo.txt“, and then hit the escape key twice to tell it to execute the command; then the file would be loaded into the buffer.
### TECO Commands
TECO commands are generally single characters. But there is some additional structure to allow arguments. There are two types of arguments: numeric arguments, and text arguments. Numeric arguments come *before* the command; text arguments come *after* the command. Numeric values used as arguments can be either literal numbers, commands that return numeric values, “.” (for the index of the buffer pointer), or numeric values joined by arithmetic operators like “+”, “-“, etc.
So, for example, the C command moves the pointer forward one character. If it’s preceded by a numeric argument *N*, it will move forward *N* characters. The J command jumps the pointer to a specific location in the buffer: the numeric argument is the offset from the beginning of the buffer to the location where the pointer should be placed.
String arguments come *after* the command. Each string argument can be delimited in one of two ways. By default, a string argument continues until it sees an Escape character, which marks the end of the string. Alternatively (and easier to read), if the command is prefixed by an “@” character, then the *first character* after the command is the delimiter, and the string will continue until the next instance of that character.
So, for example, we said “ER” reads a command into the buffer. So normally, you’d use “ERfoo.txt<ESC>“. Alternatively, you could use “@ER'foo.txt'“. Or “@ER$foo.txt$”. Or “@ERqfoo.txtq“. Or even “@ER foo.txt “.
Commands can also be modified by placing a “:” in from of them. For most commands, “:” makes them return either a 0 (to indicate that the command failed), or a -1 (to indicate that the command succeeded). For others, the colon does *something else*. The only way to know is to know the command.
TECO has variables; in it’s own inimitable fashion, they’re not called variables; they’re called Q-registers. There are 36 global Q-registers, named “A” through “Z” and “0”-“9”. There are also 36 *local* Q-registers (local to a particular *macro*, aka subroutine), which have a “.” character in front of their name.
Q-registers are used for two things. First, you can use them as variables: each Q-register stores a string *and* an integer. Second, any string stored in a Q-register can be used as a subroutine; in fact, that’s the *only* way to create a subroutine. The commands to work with Q-registers include:
* “nUq”: “n” is a numeric argument; “q” is a register name. This stores the value “n” as the numeric value of the register “q”.
* “m,nUq”: both “m” and “n” are numeric arguments, and “q” is a register name. This stores “n” as the numeric value of register “q”, and then returns “m” as a parameter for the next command.
* “n%q”: add the number “n” to the numeric value stored in register “q”.
* “^Uqstring”: Store the string as the string value of register “q”.
* “:^Uqstring”: Append the string parameter to the string value of register “q”.
* “nXq”: clear the text value of register “q”, and copy the next “n” lines into its string value.
* “m,nXq”: copy the character range from position “m” to position “n” into register “q”.
* “.,.+nXq”: copy “n” characters following the current buffer pointer into register “q”.
* “\*Qq”: use the integer value of register “q” as the parameter to the next command.
* “nQq”: use the ascii value of the Nth character of register “q” as the parameter to the next command.
* “:Qq”: use the length of the text stored in register “q” as the parameter to the next command.
* “Gq”: copy the text contents of register “q” to the current location of the buffer pointer.
* “Mq”: invoke the contents of register “q” as a subroutine.
There are also a bunch of commands for printing out some part of the buffer. For example, “T” prints the current line. The print command to print a string is control-A; so the TECO hello world program is: “^AHello world^A<ESC><ESC>”. Is that pathological enough?
Commands to remove text include things like “D” to delete the character *after* the pointer; “FD”, which takes a string argument, finds the next instance of that argument, and deletes it; “K” to delete the rest of the *line* after the pointer, and “HK” to delete the entire buffer.
To insert text, you can either use “I” with a string argument, or <TAB> with a string argument. If you use the tab version, then the tab character is part of the text to insert.
There are, of course, a ton of commands for moving the point around the buffer. The basic ones are:
* “C” moves the pointer forward one character if no argument is supplied; if it gets a numeric argument *N*, it moves forwards *N* characters. C can be preceeded by a “:” to return a success value.
* “J” jumps the pointer to a location specified by its numeric argument. If there is no location specified, it jumps to location 0. J can be preceeded by a “:” to see if it succeeded.
* “ZJ” jumps to the position *after* the last character in the file.
* “L” is pretty much like “C”, except that it moves by lines instead of characters.
* “R” moves backwards one character – it’s basically the same as “C” with a negative argument.
* “S” searches for its argument string, and positions the cursor *after* the last character of the search string it found, or at position 0 if the string isn’t found.
* “number,numberFB” searches for its argument string between the buffer positions specified by the numeric arguments.
Search strings can include something almost like regular expressions, but with a much worse syntax. I don’t want to hurt your brain *too* much, so I won’t go into detail.
And last, but definitely not least, there’s control flow.
First, there are loops. A loop is “n<commands>”, which executes the text between the left brack and the right bracket “n” times. Within the loop, “;” branches out of the loop if the last search command failed; “n;” exits the loop if the value of “n” is greater than or equal to zero. “:;” exits the loop if the last search succeeded. “F>” jumps to the loop close bracket (think C continue), “F<" jumps back to the beginning of the loop. Conditionals are generally written "n"Xthen-command-string|else-command-string'". (Watch out for the quotes in there; there's no particularly good way to quote it, since it uses both of the normal quote characters. The double-quote character introduces the conditional, and the single-quote marks the end.) In this command, the "X" is one of a list of conditional tests, which define how the numeric argument "n" is to be tested. Some possible values of "X" include: * "A" means "if n is the character code for an alphabetic character". * "D" means "if n is the character code of a digit" * "E" means "if n is zero or false" * "G" means "if n is greater than zero" * "N" means "if n is not equal to zero" * "L" means "if n is a numeric value meaning that the last command succeeded" Example TECO Code -------------------- This little ditty reads a file, and converts tabs to spaces assuming that tab stops are every 8 spaces:
FEB :XF27: F H M Y<:N ;’.U 0L.UAQB-QAUC<QC-9″L1;’-8%C>9-QCUD S DQD<I >>EX
That’s perfectly clear now, isn’t it?
Ok, since that was so easy, how about something *challenging*? This little baby takes a buffer, and executes its contents as a BrainFuck program. Yes, it’s a BrainFuck interpreter in TECO!
@^UB#@S/{^EQQ,/#@^UC#@S/,^EQQ}/@-1S/{/#@^UR#.U1ZJQZ\^SC.,.+-^SXQ-^SDQ1J#
@^U9/[]-+<>.,/<@:-FD/^N^EG9/;>J30000<0@I//>ZJZUL30000J0U10U20U30U60U7
@^U4/[]/@^U5#<@:S/^EG4/U7Q7; -AU3(Q3-91)"=%1|Q1"=.U6ZJ@i/{/Q2\@i/,/Q6\@i/}
/Q6J0;'-1%1'>#<@:S/[/UT.U210^T13^TQT;QT"NM5Q2J'>0UP30000J.US.UI
<(0A-43)"=QPJ0AUTDQT+1@I//QIJ@O/end/'(0A-45)"=QPJ0AUTDQT-1@I/
/QIJ@O/end/'(0A-60)"=QP-1UP@O/end/'(0A-62)"=QP+1UP@O/end/'(0A-46)"=-.+QPA
^T(-.+QPA-10)"=13^T'@O/end/'(0A-44)"=^TUT8^TQPJDQT@I//QIJ@O/end/'(0A-91)
"=-.+QPA"=QI+1UZQLJMRMB\ -1J.UI'@O
/end/'(0A-93)"=-.+QPA"NQI+1UZQLJMRMC\-1J.UI'@O/end/'
!end!QI+1UI(.-Z)"=.=@^a/END/^c^c'C>
If you’re actually insane enough to want to try this masochistic monstrosity, you can get a TECO interpreter, with documentation and example programs, from [here][teco-site].
[teco-site]: http://almy.us/teco.html