Now on ScienceBlogs: Wuv, Twue Wuv

ScienceBlogs Book Club: Inside the Outbreaks

Good Math, Bad Math

Finding the fun in good math; Shredding bad math and squashing the crackpots who espouse it.

Search

Profile

markcc.jpg
Mark Chu-Carroll (aka MarkCC) is a PhD Computer Scientist, who works for Google as a Software Engineer. My professional interests center on programming languages and tools, and how to improve the languages and tools that are used for building complex software systems.

Donors Choose

Other Information

Add this blog to my Technorati Favorites!

Recent Posts

Recent Comments

Categories

Blogroll

Old Topic Indices

Great Online Books

« Off topic: Mental Illness | Main | More HIV/AIDs Denial: Lying with Math »

More Minimal Masochism: Pathological Programming in OISC

Category: pathological programming
Posted on: September 15, 2006 8:19 AM, by Mark C. Chu-Carroll

Expanding on last weeks theme of minimalism, this week, we've got OISC: the One Instruction Set Computer. This is a machine-code level primitive language with exactly one instruction. There are actually a few variants on this, but my favorite is the "subleq" OISC, which I'll describe beneath the fold.

OISC, aka the "One Instruction Set Computer" is based on the idea of the Minsky Machine. The Minsky machine is an extremely simply Turing complete machine proposed by Marvin Minsky. Back at the original GM/BM, I wrote an article on the Minsky machine and a simple Minsky interpreter. A Minsky machine is a register machine with two basic instructions: increment, and decrement-and-branch.

OISC does it one better. You don't need the increment instruction. Given an unbounded set of registers, you can be Turing complete with only one instruction: subtract and branch if less than or equal to zero. (aka SUBLEQ.) SUBLEQ takes 3 operands: two registers A and B, and a jump location, C. What SUBLEQ does is subtract the contents of B from the contents of A; and if the resulting value in A is less than or equal to zero, it jumps to C; otherwise, it proceeds to the next instruction. For simplicity of notation, if C is the next instruction, we'll just leave it blank. So if instruction N is "SUBLEQ 2, 4, N+1", we'll just write it as "SUBLEQ 2, 4"

To make things easy, we'll use numbers for the registers, and assume that register 0 contains the value 0 at all times. We can initialize it that way pretty easily; just start off the program with:

       SUBLEQ 0, 0

So how do we do anything real with this bugger? Let's build some instructions. We definitely want to be able to do addition, right? So let's suppose we want to add registers X and Y.

Add(X,Y) =  SUBLEQ 0, Y ;; R0 now contains (-Y)
            SUBLEQ X, 0 ;; X now contains X-(-Y) = X+Y
            SUBLEQ 0, 0 ;; Restore R0=0

Storing a specific value in a register:

Store(X,V) = SUBLEQ X,X ;; clear X so that it contains 0
             SUBLEQ 0, V ;; put -V in 0
             SUBLEQ X, 0 ;; put -(-V) in X
             SUBLEQ 0, 0 ;; restore 0

How about a non-conditional jump to C? That's an easy one.

Jump(C) = SUBLEQ 0, 0, C

Branch to T if X=Y, or F otherwise, using

 BranchIfEquals(X, Y, T, F) =
       Store(1, X) ;; Store copies of X and Y that we can alter
       Store(2, Y)
       SUBLEQ 1, Y, L1 ;; subtract Y from X. If true, then x <= y.
       Jump(F) ;; X was not <= y, so branch to false
   L1: SUBLEQ 2, X, T ;; subtract X from Y. If true, then Y <= X, and
                      ;; we already know X <= Y, so X = Y
       Jump(F) ;; Y was not <= X, so X != Y.

So now, we've got a useful conditional branch, addition, subtraction, unconditional branch, and copying values. That's pretty much the whole story - you don't need any more than that to be turing complete.

Want to see a real OISC program? There's an OISC Interpreter written in OISC, which is quite cute:

# Written by Clive Gifford, 29/30 August 2006 based on the Wikipedia
# description of a OISC using the "subleq" instruction.
# 3 Sep 2006: did some "constant folding", removing a few instructions.
# 8 Sep 2006, a few more changes resulting in smaller memory footprint.

start:
# [A1] = [PC]
       subleq A1   A1
       subleq PC   ZERO
       subleq ZERO A1
       subleq ZERO ZERO
# Code below (after modification from above) is equivalent to [A] = [[PC]]
       subleq A    A
A1:    data   0
       data   ZERO
       data   A2
A2:    subleq ZERO A
       subleq ZERO ZERO
# [A] = [A] + [LEN]
       subleq NEGL A
# [PC] = [PC] + 1
       subleq NEG1 PC
# [B1] = [PC]
       subleq B1   B1
       subleq PC   ZERO
       subleq ZERO B1
       subleq ZERO ZERO
# Code below (after modification from above) is equivalent to [B] = [[PC] + 1]
       subleq B    B
B1:    data   0
       data   ZERO
       data   B2
B2:    subleq ZERO B
       subleq ZERO ZERO
# [B] = [B] + [LEN]
       subleq NEGL B
# We have to copy [C] now, just in case the emulated subtraction modifies [[PC] + 2].
# [PC] = [PC] + 1
       subleq NEG1  PC
# [C1] = [PC]
       subleq C1    C1
       subleq PC    ZERO
       subleq ZERO  C1
       subleq ZERO  ZERO
# Code below (after modification from above) is equivalent to [C] = [[PC] + 2]
       subleq C    C
C1:    data   0
       data   ZERO
       data   C2
C2:    subleq ZERO C
       subleq ZERO ZERO
# [[B]] = [[B]] - [[A]];
# if [[B]] <= 0 goto LEQZ
# Earlier code referring to addresses A and B has modified the next
# couple of words to create the equivalent of "subleq [A] [B] LEQZ"
# This is where we're "emulating" the subtraction...
A:     data   0
B:     data   0
       data   LEQZ
# [PC] += 1
       subleq NEG1  PC
       subleq ZERO  ZERO START
# We come to address LEQZ when the emulated subleq is going to take
# the branch to the emulated address C.
LEQZ:
# First save the "raw" new PC
# [PC] = [C]
       subleq PC    PC
       subleq C     ZERO
       subleq ZERO  PC
       subleq ZERO  ZERO
# Check if [C] is less than zero. Halt if it is. 
# We don't care about changing [C] as we've already copied the value to [PC] above.
       subleq NEG1  C     -1
# [PC] = [PC] + [LEN]
       subleq NEGL  PC
# Go back to the start of the loop ready for the next instruction
       subleq ZERO  ZERO  START
NEGL:  data  -119   # make this == -PROG (my assembler is too primitive!)
NEG1:  data  -1
ZERO:  data   0
C:     data   0
PC:    data   PROG
# Code for the program to be interpreted must start at the PROG address...
PROG:
Share on Facebook
Share on StumbleUpon
Share on Facebook

Comments

1

Note that as I pointed out in a comment on the Brainfuck entry, the concept of OISC actually precedes the mid-1990s explosion in obfuscated languages that started with BF.

Somewhere, I found a reference to a 1987 article in Communications of the ACM that referenced the sublt variant of OISC as an already well-known construct. (The reference was something like: "We already know that it is sufficient to have only one instruction to be Turing complete: subtract and branch if negative...")

I'll see if I can dig that reference up again.

Posted by: Daniel Martin | September 15, 2006 8:59 AM

2

Two things:

1. This interpreter seems to use a 'data' instruction. I know it's just for convenience, but doesn't the 1-instruction Minsky machine prove that you don't need it?

2. The linked-to post about the interpreter talks about a matrix for the interpreter, its eigenvalues/eigenfunctions, and using them for computing operation counts. I've never seen any matrix description of any programming language. Do you know anything about that? A post on this topic seems like it would be pretty cool.

Posted by: billb | September 15, 2006 9:07 AM

3

Daniel:

I don't know the original reference, but I think that the fact that there is a one instruction Turing complete machine predates Minsky's paper where he defined the Minsky machine.

Posted by: Mark C. Chu-Carroll | September 15, 2006 10:22 AM

4

billb:

I'll answer the second question first; I have absolutely no idea what he's up to with the interpreter eigenvalues. I was planning on trying to read some of his stuff when I have time.

The data instruction is not necessary, but you actually *do* need *something* beyond the one instruction. The one catch to OISC is that because it always does a two-register subtract, there's no way to get a specific value into a register. The Minsky machine avoids this because it always adds or subtracts one; so you can store a specific number in a register using a sequence of increments.

On OISC, you either need some mechanism of setting the initial value of certain registers. The self-interpreter uses a data statement to let you specify the contents of certain registers before the program starts. My personal preference would be to have one fixed register, called "1", which always contains the value "1". Then you can construct any value you need, minsky style.

Posted by: Mark C. Chu-Carroll | September 15, 2006 10:35 AM

5

Wouldn't "data" in this case be an assembler convenience? It doesn't seem to need to be part of the instruction set if you drop the notational conveniences that allow reduced numbers of parameters (though I would guess those are assembler conveniences too).

That is, let's say that the SUBLEQ instruction is represented by zero hex. Let's also say that the very first word in a program is by definition the first instruction. Then we know that if the 4th word is zero then it's a SUBLEQ and otherwise it's data. Storing an actual zero (or whatever number we choose to indicate the instruction) might now require some convolutions to avoid storing it as data immediately after a SUBLEQ but that doesn't seem impossible.

Posted by: BMurray | September 15, 2006 1:33 PM

6

Thinking further, the data assembler instruction (distinct from the OISC!) would encapsulate whatever machinations you need to store zeroes in mod 4 == 0 positions.

Posted by: BMurray | September 15, 2006 1:35 PM

7

Mark, BMurray: You guys are both right, of course. I shouldn't have been so critical before the stimulants kicked in!

Also, Mark, I think that in this assembler, the register-always-holding-1 would be called "ONE" :), but I haven't done enough parsing of that interpreter yet to figure out how he's doing the text parsing yet (perhaps I'm being dense on this?). Anyay, if you ever get around to figuring out what he's going on about w.r.t. the eigenvalues/functions/ratios, I'd love to see an explanation of it.

Posted by: billb | September 15, 2006 4:37 PM

8

Nevermind. It's a subleq OISC simulator that runs on an subleq OISC, not a compiler/assembler. I get it now, and see how it works (in the abstract, at least).

Posted by: billb | September 15, 2006 4:42 PM

9

OK, I couldn't help myself. Here's the relevant part of the post that describes the matrix stuff.

Therefore for any instruction being interpreted, I could follow the execution path through the interpreter and determine exactly which instructions it executed and also how many times. From this I built a matrix M with m rows and m columns (each row and column corresponding to one of the m instructions in the instruction set). Each column of the matrix corresponds to the emulation of one instruction from the instruction set, and the values in that column (reading the rows from top to bottom) are exactly the number of times each instruction is executed in the current level of the interpreter in order to handle that task. In other words, the value in row i and column j is the number of times instruction i has to be executed during the main loop in order to handle one instance of instruction j in the code being interpreted.

Then, the maximum eigenvalue of this matrix is the asymptotic factor by which the run time grows as you stack self-interpreters. The smaller this number is, the more efficient the language/instruction set is at running itself. The subleq intepreter is in the high 30's.

Posted by: billb | September 15, 2006 4:54 PM

10

One thing about OISC that really interests me is that, unlike most pathological programming paradigms, you can actually implement it in hardware not only realistically, but really pretty easily.

Dave Lowry has built 'a 16 bit OISC in a FPGA. Connected are ROM, SRAM, UART and a pseudo-ISA slot, which currently holds a floppy controller card. It's running Forth. Speed is 500,000 moves/second. Haven't done any benchmarking, but it "feels" as fast as a 68HC11 running a similar Forth model (eForth).'

Posted by: Andrew McClure | September 19, 2006 12:16 AM

11

I'm the author of the OISC self-interpreter listed in the post and mentioned in the comments above. Nice to see that someone has read my ramblings!

Just to confirm, the "data" instruction is only a way to load specific values into memory (in my home grown assembler). It is not a different exectuable instruction. Note also that my variant of subleq subracts the contents of A from B and stores in B rather than the other way around. (I based my "subleq" on the wikipedia description although I'm aware that at least some other descriptions use the A = A - B variant.)

The nature of an "eigenratio" has been nicely summarised by billb in his last comment above. I dreamt up the "eignenratio" terminology after finding that (for some machine language style self-interpreters at least) a matrix could be used to describe/calculate the number of instructions executed by ever taller towers of that interpreter running itself, and that furthermore the dominant eigenvalue of that matrix is also the "asymptotic factor by which the run time grows as you stack self-interpreters" - thanks again for that concise description!

My diversion into this area started from playing with the self-interpreter released by the organisers of the 2006 ICFP Programming contest. (See http://icfpcontest.org for more.)

Posted by: Clive Gifford | September 21, 2006 7:53 PM

12

Mark C. Chu-Carroll wrote:
"I don't know the original reference, but I think that the fact that there is a one instruction Turing complete machine predates Minsky's paper where he defined the Minsky machine."

It seems to me that Turing machines themselves are OISCs. Each "state" of an m-symbol TM is an instruction with 3m components (just as each instruction of a subleq OISC has 3 components). For example, a 2-symbol TM is an OISC whose instructions in "OISC form" are 6-tuples:

write-shift-goto  w0 s0 g0 w1 s1 g1
where the two write-shift-goto triplets specify the symbol to write, the shift direction, and the next instruction to goto in the two possible cases for the currently-scanned symbol.

Posted by: Anonymous | December 18, 2007 12:28 AM

13

The preceding "Anonymous" post (#12 re TMs as OISCs) is mine. (Sorry, I hit the "post" button sooner than intended.)

Posted by: r.e.s. | December 18, 2007 12:35 AM

14

Recently I designed a simpler OISC language BitBitJump. It is similar to Subleq, but instead of arithmetic subtraction the instruction copies one bit and jumps unconditionally.

Posted by: Oleg | August 26, 2009 8:07 PM

ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter

© 2006-2011 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.