Programming without Control: Smith

Joy of joys, [Cat's Eye Technologies](http://catseye.mine.nu:8080/projects/), the home of Chris Pressey, one of the most prolific designers of thoroughly bizarre languages is back up! And so, today, we're going to take a look at one of his masterpieces, the language *Smith*. This is one of my personal all-time favorite pathological languages; it's positively brilliantly twisted.

Smith is, mostly, a pretty typical memory-oriented machine language. So what makes it pathological? It's got *no* jumps, *no* branches, *no* control flow whatsoever. The program counter starts at zero, and every step, it increments by one *and only one*.

So, you might quite sensibly ask, how on earth could this beast do anything useful, much less be Turing complete? The answer is: self-modifying code; in particular, the program can *copy* its own instructions.

Want to do a loop? No problem: it's just a set of instructions that repeats, right? So the last instruction *copies* the loop *including* the copy instruction. But how do you terminate the loop? There's no compare instruction!
You *compute* the number of instructions to copy. So you cleverly arrange to have a register holding the length of the loop *without* the copy instruction. And then you do a computation that winds up either adding 0 or 1 to that length.

Let's take a quick look at the instructions, so that we can show examples to make that a bit clearer. The SMITH machine is a pretty typical memory-based machine with the conditional branch and comparison instructions removed. Most operations work on memory locations, which SMITH calls registers. (They behave like memory, not like normal machine registers.) The basic values in the Smith language are literals (written as signed numbers); locations (written R*N*, where *N* is the numeric address); and the values stored in locations (written R[R*N*]);

The instructions are:

1. MOV target, value. Copy the value to the target. The target is either a register or TTY. If it is a register, it can be either a direct or indirect (RN or R[RN]); if it is TTY, then the value is copied to standard out. The value can be a literal, a register, an indirect register (N, RN, R[RN]), PC (for the value of the program counter), TTY (which reads a value from standard in), or "\*" which is the current line number in the source file.
2. MOV target, stringlit. A special form of MOV; stores the characters of stringlit in sequence at (target, target+1, target+2, etc.)
3. SUB target, value. Subtract the value from the target, storing the result in the target. Target is a register, either direct or indirect; value is either a register or a literal.
4. MUL target, value: multiply the target by the value, and store the result in the target. The parameters are the same as in "SUB".
5. NOT target. Target is a register. This is the closest thing to a conditional in SMITH. If the value of target is 0, then this makes it 1; if the target is non-zero, then this makes it 0. So this basically converts a value to a boolean and reverses it.
6. COR target, start, length. Copy a sequence of "length" instructions, starting with the instruction located "start" away from the PC to an address starting at an offset of "target" from the current PC. Start, target, and length can all be either literals or locations.
7. NOP. A no-op; does nothing. This gives us one of our two ways of doing something sort-of conditional; we can copy a NOP over another copy instruction.
9. STOP. End the execution of the program.
10. BLA target, NOP, length. Similar to the "COR" instruction, except that this just copies a sequence of "length" NOP instructions to the target offset.

There's also one pseudo-instruction, "REP". REP int, instruction inserts "int" copies of "instruction" at the current line of the source file. So "REP 3, NOP" is equivalent to three lines, each of which contains a single "NOP".

Now that we've seen the instructions, let's look at a classic "Hello world" program:

; Hello, world in SMITH - version 2 (loop)
; R0 -> index into string (starts at R10)
; R2 -> -1
MOV R0, 10
MOV R2, 0
SUB R2, 1
MOV R[R0], "Hello, world!"
MOV TTY, R[R0]
SUB R0, R2
MOV R1, R0
SUB R1, 23
NOT R1
NOT R1
MUL R1, 8
COR +1, -7, R1

* MOV,MOV,SUBL This starts by putting the value "10" into R0, and -1 into R2. 10 is the address where the "hello world" string is going to start; and the "-1" is going to be used for incrementing a pointer to the current index in the string. Since there is no "ADD" instruction, we have to stick with subtracting -1.
* MOV R[R0],string. Store "Hello world" starting at location 10.
* MOV TTY, R[R0]. Output the value stored at the address contained by location 0, which is initially 10 - that's the first character of the string.
* SUB R0, R2. Add one to R0, so that it points at the next character of the string.
* MOV R1, R0. Put the value of the pointer into the string into R1.
* SUB R1, 23. This is where it gets clever. 22 is the location of the last character of the string. So if R1 is now 23, then we've already output the last character. So by doing this subtraction, we end up with "0" in R1 if we've output the last character, or some negative value if we haven't.
* NOT R1/NOT R1. Converts whatever is in R1 into 0 (if we've output the last character already) or 1 (if we haven't).
* MUL R1, 8. There are 8 instructions to copy if we want to do another iteration of the loop. So multiple R1 by 8. If there's more iterations to be performed, R1 is now 8; if not, then R1 is now 0.
* COR +1, -7, R1. Copy R1 instructions to the *next* instruction address, starting from 7 instructions before here. So if we've already output all of the characters, this instruction copies 0 instructions, and we're at the end of the program. Otherwise, it copies 8 instructions - which is the loop.

Now, is that brilliantly evil, or what?

Here's another example - really quite a remarkably clever one. It does brace matching on an input file.

; Check input file for matching brackets in SMITH!
; Prints nothing if there was an error or 'OK' if brackets match.
; R0 -> stack pointer (starts at R32)
; R1 -> work register
; R2 -> -1
; R3 -> input char
; R4 -> scratch temp copy of PC
; R5 -> scratch temp copy of *
; R6 -> scratch for PC/* arithmetic
; R7 -> 2
; R8 -> 9
; R9 -> scratch char
; R10 -> 15
; R11 -> 1

; set up stack
MOV R0, 32
MOV R2, 0
SUB R2, 1
MOV R8, 9
MOV R7, 2
MOV R10, 15
MOV R11, 1
; push 'S'
MOV R[R0], "S"
SUB R0, R2
; LABEL MainLoop
; read char
MOV R3, TTY

; if char == '{' push
MOV R1, R3
MOV R[R8], "{"
SUB R1, R9
NOT R1
MUL R1, 2 ; LENGTH PushLeftBrace

MOV R4, PC ; R4 = PC
MOV R5, *
MOV R6, 0
SUB R6, 10048; bytes between here and PushLeftBrace
SUB R5, R6 ; R5 = PushLeftBrace
SUB R5, R4 ; R5 = PushLeftBrace - PC

BLA +2, NOP, R7

COR +1, R5, R1

NOP
NOP

; if char == '}' pop

MOV R1, R3
MOV R[R8], "}"
SUB R1, R9
NOT R1
MUL R1, 15 ; LENGTH PopLeftBrace

MOV R4, PC ; R4 = PC
MOV R5, *
MOV R6, 0
SUB R6, 10035; bytes between here and PopLeftBrace
SUB R5, R6 ; R5 = PopLeftBrace
SUB R5, R4 ; R5 = PopLeftBrace - PC

BLA +2, NOP, R10

COR +1, R5, R1

REP 15 NOP

; if char != 0 goto MainLoop

MOV R1, R3
NOT R1
NOT R1
MUL R1, 49 ; LENGTH MainLoop + 1

; LENGTH MainLoop
COR +1, -48, R1

REP 10000 NOP ; This space intentionally left blank

; x = pop

SUB R0, 1
MOV R1, R[R0]

; if x != 'S' stop

MOV R[R8], "S"
SUB R1, R9
NOT R1
NOT R1

COR +1, +6, R1

NOP

; print 'OK'

MOV R[R8], "O"
MOV TTY, R9
MOV R[R8], "K"
MOV TTY, R9

STOP

; LABEL PushLeftBrace
; push '{'
MOV R[R0], "{"
SUB R0, R2

; LENGTH PushLeftBrace == 2
; LABEL PopLeftBrace

; x = pop;

SUB R0, 1
MOV R1, R[R0]

; if x != "{" stop

MOV R[R8], "{"
SUB R1, R9
NOT R1
NOT R1

MOV R4, PC ; R4 = PC
MOV R5, *
MOV R6, 0
SUB R6, 1 ; bytes between here and Halt
SUB R5, R6 ; R5 = Halt
SUB R5, R4 ; R5 = Halt - PC
BLA +2, NOP, R11
COR +1, R5, R1
NOP
; LENGTH PopLeftBrace == 15
; LABEL Halt
STOP
; LENGTH Halt == 1

Because I'm a thoroughly insane person, I'm actually working on a brainfuck interpreter in SMITH. If I ever manage to get it working, I'll be sure to post it.

More like this

I thought that it would be fun to stick with the "stack-based" theme of last week's pathological post, but this time, to pick an utterly pointlessly twisted stack based language, but one that would be appreciated by the mascot of one of my fellow ScienceBlogs. Orac, this one's for you! :-) Our…
Todays dose of pathology is another masterpiece from the mangled mind of Chris Pressey. It's called "[Version](http://catseye.mine.nu:8080/projects/version/)", for no particularly good reason. It's a wonderfully simple language: there's only *one* actual kind of statement: the labelled assignment.…
The latest issue of Embedded Systems Design has an interesting article on combining C code with assembly code for DSP applications. In some cases, they show an ten fold speed up for an assembly plus C implementation versus straight C code. Now before anyone starts hollering, please remember that…
Today's a mighty cool example of bizzare language design, called GammaPlex In terms of language design, it's nothing particularly special: it's yet another stack language with a befunge-like graphical syntax. What's unusual about GammaPlex is that it's strongly focused on graphics. It's got built…

Pretty OT but what the heck.... 2 questions

I am a statistician; I haven't programmed hardly at all (REALLY minimal stuff). Now I am trying to learn R, which is an object oriented programming environment for statistics and graphics. But I am haing a continually hard time getting my head around it.

R is, in the words of some of its devotees, 'expert-friendly'. When I get a beginner's book, all the examples work, but when I try to improvise, things go haywire.

Any advice on how to get my brain working the right way?

Second question
I like playing around with number theory. I'd like to get some language that let's me do things in number theory. I would prefer a language that is relatively straightforward and free or low cost.

Thanks! Sorry for OTness, but this seems a place to ask

Peter:

With respect to "R" programming: I'm really clueless. Since I don't do work in statistics, I've never had any reason or opportunity to learn it. So I don't have a clue of what would help learn it. Domain specific languages like that tend to be idiosyncratic, so without knowing the language, it's hard to make good suggestions.

For languages to use for playing with number theory, I'd suggest Scheme and Haskell. Both can be had for free: for scheme, DrScheme is a fabulous implementation. For Haskell, I'd suggest either GHC or Gofer, along with Eclipse.

I almost wonder if Prolog would be good for playing with number theory, since the thing's halfway to being a theorem prover out of the box. Of course it's possibly not worth the cost of having to work in Prolog in the first place... :)

Along which lines, MarkCC, if you take requests, Haskell's "amb" operation might be a good topic to write up a blog post on if you ever feel bored enough to give it a try. I'm still trying to wrap my head around that one, and I'm not quite sure whether if it would qualify under your lambda calculus / functional programming series, or your pathological programming series...

Peter:

For number theory you should take a look at PARI/GP:

"PARI/GP is a widely used computer algebra system designed for fast computations in number theory (factorizations, algebraic number theory, elliptic curves...), but also contains a large number of other useful functions to compute with mathematical entities such as matrices, polynomials, power series, algebraic numbers etc., and a lot of transcendental functions. PARI is also available as a C library to allow for faster computations."

http://pari.math.u-bordeaux.fr/

Thanks for the posts re number theory. I will look into the programs mentioned

With regard to R, I think a lot of my trouble is due to it being an object oriented language. The tiny bit of programming I've done has been in Basic (decades ago) and SAS, and SAS is purely procedure driven, so R is a totally different way of thinking. All objects and vectors.

Also the help files in R are almost deliberately obscure. Some of the leaders of the R community seem to want to keep it as an 'experts only' type of software.

Peter, to learn some OOP you can learn some scripting language like Python, using some book that teaches it. In few months you will know enough OOP.

Peter:
For playing around with Number theory, I think Wolframs Mathematica would be ideal.
it does both symbolic computations, and handles arbitrarily large integers exactly.

though i guess it is pricey.

Brian:

I agree that Mathematica would be ideal for what Peter's talking about, but since he was looking for cheap/free, that pretty much disqualifies it. I would really love to have a copy of mathematica to play with, but there's just no way that I can justify $2000 for a piece of software.

Mark,

Since you've talked about software for number theory, how about a take on assisted proof software, like PVS and Coq. Have you ever had a look at any of them?

Although I still can't afford it, my university offers Mathematica for about $120. I'll buy it at some point. ^_^

By Xanthir, FCD (not verified) on 30 Sep 2006 #permalink

I've looked a little at Mathematica, esp. bcs. there's a book that teaches mathematical stat using Mathematica. But, as Mark said $2000 is not in my budget for something that would just be for a hobby

With suitable changes, SMITH can become a queue based programming language, where instructions are piped in thru a queue, and COR and BLA just push instruction back into the queue to be executed again.

Re: R
Look at "S" tutorials also since R is the Gnu version of S. They are a little better. R reminded me of Scheme, not the syntax at all, but the kind of mindset to use.

Thanks Markk, I did know that S was more or less the same as R (there are some differences, but the basics are the same)

I've been looking a little at PARI which seems cool, but may look at scheme as well

Peter: I've been ramping up my R usage over the past two years to the point that it's now virtually the only language I use. I've never thought of it as "object-oriented". I mean, I guess it *is*, technically, but that doesn't seem like it should be a stumbling block, since you never really have to worry about typical OO issues like inheritance, encapsulation, etc. The hard part of learning to use R is learning how to think in terms of vectors and matrices rather than for loops. (As well as the cryptic and often-unhelpful "help".) So I think you're better off not worrying about the OO aspect, which is really pretty basic.