Now on ScienceBlogs: Father and Mother and Uncle John...: Tribalism and a Place at the Table

Enter to Win

Good Math, Bad Math

Finding the fun in good math; Shredding bad math and squashing the crackpots who espouse it.

Search

Profile

markcc.jpg
Mark Chu-Carroll (aka MarkCC) is a PhD Computer Scientist, who works for Google as a Software Engineer. My professional interests center on programming languages and tools, and how to improve the languages and tools that are used for building complex software systems.

Donors Choose

Other Information

Add this blog to my Technorati Favorites!

Recent Posts

Recent Comments

Categories

Blogroll

Old Topic Indices

Great Online Books

« Causeless Math from Dembski and Friend | Main | Irrational and Transcendental Numbers »

Friday Pathological Programming: Programming Through Grammars in Thue

Category: pathological programming
Posted on: August 4, 2006 8:30 AM, by Mark C. Chu-Carroll

Today's treat: Thue, pronounced two-ay, after a mathematician named Axel Thue.

Thue is a language based on semi-Thue grammars. semi-Thue grammars are equivalent to Chosmky-0 grammars, except that they're even more confusing: a semi-Thue grammar doesn't distinguish between terminal and non-terminal symbols. (Since Chomsky-0 is equivalent to a turing machine, we know that Thue is turing complete.) The basic idea of it is that a program is a set of rewrite rules that tell you how to convert one string into another. The language itself is incredibly simple; programs written in it are positively mind-warping.

There are two parts to a Thue program. First is a set of grammar rules, which are written in what looks like pretty standard BNF: "<lhs> ::= <rhs>". After that, there is a section which is the initial input string for the program. The two parts are separated by the "::=" symbol by itself on a line.

As I said, the rules are pretty much basic BNF: whatever is on the left-hand side can be replaced with whatever is on the right-hand side of the rule, whenever a matching substring is found. There are two exceptions, for input and output. Any rule whose right hand side is ":::" reads a line of input from the user, and replaces its left-hand-side with whatever it read from the input. And any rule that has "~" after the "::=" prints whatever follows the "~" to standard out.

So, a hello world program:

x ::=~ Hello world
::=
x

Pretty simple: the input string is "x"; there's a replacement for "x" which replaces "x" with nothing, and prints out "Hello world".

How about something much more interesting. Adding one in binary. This assumes that the binary number starts off surrounded by underscore characters to mark the beginning and end of a line.

1_::=1++
0_::=1

01++::=10
11++::=1++0

_0::=_
_1++::=10
//::=:::
::=
_//_

Let's tease that apart a bit.

  1. Start at the right hand side:
    1. "1_::=1++" - If you see a "1" on the right-hand end of the number, replace it with "1++", removing the "_" on the end of the line. The rest of the computation is going to use the "++" to mark the point in string where the increment is going to alter the string next.
    2. "0_::=1" - if you see a "0" on the right-hand-end, then replace it with 1. This doesn't insert a "++", so none of the other rules are going to do anything: it's going to stop. That's what you want: if the string ended in a zero, then adding one to it just changes that 0 to a 1.
  2. Now, we get to something interesting: doing the addition:
    1. If the string by the "++" is "01", then adding one will change it to "10", and that's the end of the addition. So we just change it to 10, and get rid of the "++. "
    2. If the string by the "++" is "11", then we change the last digit to "0", and move the "++" over to the left - that's basically like adding 1 to 9 in addition: write down a zero, carry the one to the left.
    3. Finally, some termination cleanup. If we get to the left edge, and there's a zero, after the "_", we can get rid of it. If we get to the left edge, and there's a "1++", then we replace it with 10 - basically adding a new digit to the far left; and get rid of the "++", because we're done.
  3. Finally, inputting the number "//::=:::" says "If you see "//", then read some input from the user, and replace the "//" with whatever you read.
  4. Now we have the marker for the end of the rules section, and the string to start the program is "_//_". So that triggers the input rule, which inputs a binary number, and puts it between the "_"s.

Let's trace this on a couple of smallish numbers.

  • Input: "111"
    1. "_111_" → "_111++"
    2. "_111++_" → "_11++0"
    3. "_11++0" → "_1++00"
    4. "_1++00" → "1000"
  • Input: "1011"
    1. "_1011_" → "_1011++"
    2. "_1011++" → "_101++0"
    3. "_101++0" → "_1100".

Not so hard, eh?

Decrement is the same sort of thing; here's decrement with an argument hardcoded instead of reading from input:

0_::=0--
1_::=0
10--::=01
00--::=0--1
_1--::=@
_0--::=1
_0::=
::=
_1000000000000_

Now, something more complicated. From here, a very nicely documented program that computes fibonacci numbers. It's an interesting mess - it converts things into a unary form, does the computation in unary, then converts back. This program uses a fairly neat trick to get comments. Thue doesn't have any comment syntax. But they use "#" on the left hand side as a comment marker, and then make sure that it never appears anywhere else - so none of the comment rules can ever be invoked.


#::= fibnth.t - Print the nth fibonacci number, n input in decimal
#::= (C) 2003 Laurent Vogel
#::= GPL version 2 or later (http://www.gnu.org/copyleft/gpl.html)
#::= Thue info at: http://www.catseye.mb.ca/esoteric/thue/
#::= This is modified thue where only foo::=~ outputs a newline.
#::= 'n' converts from decimal to unary-coded decimal (UCD).
n9::=*********,n
n8::=********,n
n7::=*******,n
n6::=******,n
n5::=*****,n
n4::=****,n
n3::=***,n
n2::=**,n
n1::=*,n
n0::=,n
n-::=-

#::= '.' prints an UCD number and eats it.
.*********,::=.~9
.********,::=.~8
.*******,::=.~7
.******,::=.~6
.*****,::=.~5
.****,::=.~4
.***,::=.~3
.**,::=.~2
.*,::=.~1
.,::=.~0

~9::=~9
~8::=~8
~7::=~7
~6::=~6
~5::=~5
~4::=~4
~3::=~3
~2::=~2
~1::=~1
~0::=~0

#::= messages moving over ',', '*', '|'
*<::=<*
,<::=<,
|<::=<|

0>*::=*0>
0>,::=,0>
0>|::=|0>
1>*::=*1>
1>,::=,1>
1>|::=|1>
2>*::=*2>
2>,::=,2>
2>|::=|2>

#::= Decrement n. if n is >= 0, send message '2>';
#::= when n becomes negative, we delete the garbage using 'z' and
#::= print the left number using '.'.
*,-::=,2>
,,-::=,-*********,
|,-::=z
z*::=z
z,::=z
z|::=.

#::= move the left number to the right, reversing it (0 becomes 9, ...)
#::= reversal is performed to help detect the need for carry. As the 
#::= order of rule triggering is undetermined in Thue, a rule matching 
#::= nine consecutive * would not work.
*c<::=c<0>
,c<::=c<1>
|c<::=2>
0>d*::=d
1>d::=d*********,

#::= when the copy is done, 'd' is at the right place to begin the sum.
2>d::=f<|

#::= add left to right. 'f' moves left char by char when prompted by a '<'.
#::= it tells 'g' to its right what the next char is.
*f<::=f*0>
,f<::=f,1>

#::= when done for this sum, decrement nth.
|f<::=-|

#::= upon receiving '0>' 'g' drops an 'i' that increments the current digit.
0>g::=ig
*i::=<
,i::=i,*********
|i::=<|********,*********

#::= '1>' tells 'g' to move left to the next decimal position, 
#::= adding digit 0 if needed (i.e. nine '*' in reversed UCD)
*1>g::=1>g*
,1>g::=<g,
|1>g::=|<*********g,

#::= '2>' tells 'g' that the sum is done. We then prepare for copy (moving 
#::= a copy cursor 'c' to the correct place at the beginning of the left 
#::= number) and reverse (using 'j') the right number back.
*2>g::=2>g*
,2>g::=2>g,
|2>g::=c|*********j

#::= 'j' reverses the right number back, then leaves a copy receiver 'd'
#::= and behind it a sum receiver 'g'.
*j*::=j
j,*::=,********j
j,,::=,*********j,
j,|::=<,dg|

#::= '?' used to input n
?::=:::

#::= the initial data is set up, so that after reading n ('?'), converting 
#::= it to UCD ('n') and decrementing it (-), message '2>' sent by '-' 
#::= will start when hitting 'g' the first swap + sum cycle
#::= for numbers 0 (left) and 1 (reversed, right).
::=

|n?-|,|********g,|

Ok. Now, here's where we go really pathological. Here, in all it's wonderful glory, is a Brainfuck interpreter written in Thue, written by Mr. Frederic van der Plancke. (You can get the original version, with documentation and example programs, along with a Thue interpreter written in Python from here)


#::=# BRAINFUNCT interpreter in THUE
#::=# by Frederic van der Plancke; released to the Public Domain.

o0::=0o
i0::=0i
o1::=1o
i1::=1i
o]::=]o
i]::=]i
o[::=[o
i[::=[i
oz::=zo
oZ::=Zo
iz::=zi
iZ::=Zi
0z::=z0
1z::=z1
]z::=z]
[z::=z[
_z::=z_
0Z::=Z0
1Z::=Z1
]Z::=Z]
[Z::=Z[
_Z::=Z_
}}]::=}]}
&}]::=]&}
&]::=]^
}[::=[{
&[::=[&{
&0::=0&
&1::=1&
}0::=0}
{1::=1{
?Z[::=[&
?z[::=[^
?z]::=\]
?Z]::=]^
[{{::={[{
[{\::=\[
[\::=[^
]{::={]
]\::={\]
0\::=\0
1\::=\1
0{::={0
1{::={1
^[::=?[iii
^]::=?]iii
}|::=}]|WARN]WARN
&|::=&]|WARN]WARN
WARN]WARN::=~ERROR: missing ']' in brainfunct program; inserted at end
^000::=000^ooo
^001::=001^ooi
^010::=010^oio
^011::=011^oii
^100::=100^ioo
^101::=101^ioi
ooo|::=|ooo
ooi|::=|ooi
oio|::=iio|oio
oii|::=|oii
ioo|::=|ioo
ioi|::=|ioi
iii|::=|iii
iio|!::=|
|z::=z|
|Z::=Z|
o_::=_o
i_::=_i
0!::=!0
1!::=!1
_!::=!_
/!::=!/
oooX!::=Xooo
oooY::=$Y
0$::=!1
1$::=$0
X$::=X!1
ooiX!::=Xooi
ooiY::=@1Y
1@1::=@00
0@1::=@11
1@0::=@01
0@0::=@00
X@1::=X!WARNDEC0WARN
WARNDEC0WARN::=~WARNING: attempt at decrementing zero (result is still zero)
X@00::=X@0
X@01::=X!1
X@0Y::=X!0Y
oioX!::=LLYLR
0LL::=LL0
1LL::=LL1
_LL::=!X!
|LL::=|!0X!
LR0::=0LR
LR1::=1LR
LRY::=_
oiiX!::=_oii
oiiY::=X!%
%0::=0%
%1::=1%
%_0::=Y0
%_1::=Y1
%_/::=Y0_/
iooX!::=X(in)
(in)0::=(in)
(in)1::=(in)
(in)Y::=Yi
i/0::=Zi/
i/1::=zi/
i/_::=!/
XY!::=X!0Y
0Y!::=0!Y
1Y!::=1!Y
YZ::=0Y
Yz::=1Y
ioiX!::=X(out)
(out)0::=0(out)O0
(out)1::=1(out)O1
(out)Y::=OO!Y
O0::=~0
O1::=~1
OO::=~_
iiiX!0::=ZX!0
iiiX!1::=zX!1
+::=000
-::=001
<::=010
>::=011
,::=100
.::=101
:::=|X!0Y0_/
::=
^*

Share this: Stumbleupon Reddit Email + More

Comments

1

BNF...

WTF?

Ah, Algol! ( http://cui.unige.ch/db-research/Enseignement/analyseinfo/AboutBNF.html ).

Posted by: Torbjörn Larsson | August 4, 2006 11:45 AM

2

I just ran across an old article describing a joke pathological language called Babbage, still in the design phase.

A sample:

The Babbage Language Design Group is continuously evaluating new features that will keep its users from reaching any level of effectiveness. For instance, Babbage's designers are now considering the ALMOST EQUALS sign, used for comparing two floating-point numbers. This new feature "takes the worry out of being close."

Posted by: Canuckistani | August 4, 2006 4:08 PM

3

1 -> "_111_" → "_111++"
2 -> "_111++_" → "_11++0"

Typo: lose trailing '_' in lhs of 2

Posted by: truth machine | August 4, 2006 6:11 PM

4

Typo: Chosmky

Posted by: Bruno | August 29, 2006 9:44 PM

5

The binary incrementing script occasionally leaves '_' at the very beginning of a result. I think, the following would be a more adequate version:

1b::=1++
0b::=1

01++::=10
11++::=1++0

a0::=a
a1++::=a10

0_::=_0
1_::=_1
a_0::=a_
a_1::=1

//::=:::
::=
a//b_

Posted by: Den | December 8, 2006 8:23 AM

6

Once again, I'm feeling stupid. Can someone say how you write out the result from for example the increment example?

Posted by: hans | June 27, 2009 9:15 AM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)






Stats

ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Collective Imagination
Enter to win the daily giveaway
Advertisement
Collective Imagination

© 2006-2009 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.