Good Math, Bad Math

I feel like a bit of a change of pace, and trying a bit of an experiment.

Re-reading Backus’s old FP work reminds me of what I was doing the last time I read it, which
was back in grad school. At the time, I was working on some stuff involving parallel computation,
and I discovered Robin Milner’s π-calculus as a tool for describing parallel computation. You
can think of π-calculus as being a sort of parallel (pun intended) for the λ-calculus: in
sequential, single-threaded computation, λ-calculus can be a great tool for describing what
things mean. But λ-calculus has absolutely no way of describing the concept of multiple
things happening at once. π-calculus is fundamentally built on the concept of multiple threads
which can only share information by passing messages.

There’s a reason that reading Backus made me think of this, beyond just the temporal coincendence of the fact that I was working with π-calculus the last time I read Backus’s
FP paper. One of the major points that Backus made was how poorly the vonNeumann model was
at describing many computations; that has become far more true in recent years. Even my laptop now has multiple processors; computation just isn’t single-threaded anymore. But most programming languages are basically deeply single-threaded. Even Haskell, for all of its functional
purity, isn’t particularly good at multi-threaded execution. But I’ve always thought it would be
a great idea to build a language around π-calculus the way that ML is built around λ-calculus.

So my experiment, such as it is, is to see if I can work through how to create an actual, useful, non-pathological programming language based on the π-calculus; and not just do that,
but do it publicly, here on the blog.

The obvious starting place is with a quick overview of what π-calculus is. Today, I’llgive you a quick overview of it; and then in another post, I’ll dive into a bit of semantics – but only as much semantics as is really relevant to building a language out of it.

π-calculus is really remarkably simple. The idea is that you describe a set of
processes using the syntax of the calculus, and then there are a set of reduction rules
that describe how the computation actually executes. Syntactically, there are really only six
constructs in λ-calculus; and some of those can be viewed as “compounds”. The basic
constructs are:

  1. Parallel composition: if P and Q are both processes then “P|Q” is a process which runs “P” and “Q” in parallel.
  2. Non-deterministic choice: if P and Q are both processes, then “P+Q” is a process
    which will be non-deterministically transformed into either “P” or “Q”, but not both.
  3. Receive a message. If “P” is a process, and “c” is the name of a communication channel, and
    “x” is an identifier, then then “c(x).P” waits for a message to be received
    on channel “c”, binds the contents of that message to the identifier “x”,
    and then becomes process P. This is very much like the π-calculus version of
    a λabstraction.
  4. Send a message: If “P” is a process, “x” is a value, and “c” is the name of a communication channel, then “*c(x).P” sends the value “x” as a message on channel “c” and then proceeds with process “P”. (In standard π-calculus syntax, this would be written a c with an overbar, but that’s very awkward in HTML, so I’m using “*” as an alternative.)
  5. Replication: If “P” is a process, then “!P” is a process consisting of an arbitrary
    number of copies of P. “!P” is syntactically equivalent to “P|!P“. You can think of “!P” as being something like an HTTP service: no matter
    how many times you send an HTTP request to a server, there’s always an HTTP listener waiting
    to receive your request.
  6. Channel creation: If “P” is the name of a process, and “x” is an unbound identifier,
    then “(νx)P” creates a new communication channel bound to the identifier “x”, and
    then proceeds with process P (where the new identifier is in scope.)
  7. Process termination/null process. “0” is an identifier for a null process. It’s generally
    used to describe process termination: when a process P become 0, it halts.

Computations in π-calculus is described in terms of a fundamental reduction rule, →, which is read “reduces in one step” (think of → as the π-calculus version of β-reduction): “*x(z).P | x(y).Q→P|(Q[z/y])“, where Q[z/y] means “Q with z replacing y”.

To clarify the what this means in the context of the other constructs, we also usually say the following, although strictly speaking, they should be derivable from the syntax and the basic reduction rule:

  • If “P→Q” then “P|R→Q|R”
  • If “P→Q” then “(νx)P → (νx)Q”.
  • If “(P→Q)” then “(P+R)→Q”.

Suppose that “out” is a communication channel that prints to a standard output. Then
the following is a little “program” in π-calculus which prints hello world.

	
    (νin)( (!(in(x).*out(x).0))
            | (*in(hello).*in(world).0) )

We can walk through a quick reduction of that:

	(νin)( (!(in(x).*out(x).0))
           | (*in(hello).*in(world).0) ) 
   = (νin)( (in(x).*out(x).0|!(in(x).*out(x).0))
           | (*in(hello).*in(world).0) )    [! expansion]
   → (νin)( *out(hello).0|!(in(x).*out(x).0))
           | (*in(world).0) )    [one → step on channel "in"]
   → (νin)( !(in(x).*out(x).0))
           | (*in(world).0) )    [one → step with the "invisible" standard out process step]
	= (νin)( (in(x).*out(x).0|!(in(x).*out(x).0))
           | (*in(world).0) )    [Another "!" expansion]
   → (νin)( *out(world).0|!(in(x).*out(x).0))
           | (0) )    [→ on "in"]
   → (νin)(!(in(x).*out(x).0)) [→ with standard out process on "out".]

Comments

  1. #1 Magus
    March 21, 2007

    Typo: “Syntactically, there are really only six constructs in λ-calculus;” I’m guessing you mean π-calculus.

  2. #2 Xanthir, FCD
    March 21, 2007

    Mark, this is completely unrelated, but is the email address on your homepage (found through google) still current? If not, where might I find your email address?

  3. #3 falcon
    March 22, 2007

    Syntactically, there are really only six constructs in λ-calculus; and some of those can be viewed as “compounds” . Did you mean pi calculus there?

    So what is the difference between pi calculus and the actors model? Erlang is supposed to be based on the actor model, how would it be different if it was based on pi calculus?

    Does pi calculus also require something like monads to do IO, State, etc.?

    Does pi calculus also have its counters in some logic (curry/howard isomorphism?)

    Alice ML people have done a paper on implementing futures in lambda calculus…so perhaps that shows one kind of concurrency, although, I guess, the idea there is that the concurrency is determenistic and can be considered ‘hidden’ from a user’s perspective.

    Pierce did some work on a pi-calculus language a while ago: http://citeseer.ist.psu.edu/pierce97pict.html

    I have this blog on my rss feed but the project you are discussing now make me even more interested :)

  4. #4 Jonathan Vos Post
    March 22, 2007

    This is very cool.

    Do you see any connection with:

    Samson Abramsky’s “What are the fundamental structures of concurrency? We still don’t know!”

    as discussed on March 5, 2007
    at n-Category Cafe,
    Computer Science and Physics
    Posted by David Corfield

  5. #5 Koray
    March 22, 2007

    What do you mean Haskell isn’t particularly good at parallelism? Despite Control.Concurrent.Chan and Control.Concurrent.STM?

  6. #6 ObsessiveMathsFreak
    March 22, 2007

    The key to creating a good programming language (that people will actually use) will be to making the syntax i)intuative ii)readable iii)scalable.

    Could you show how parallelised matrix multiplication can be represented by pi-calculus?

    My mind is envisioning a language to ease parallelisation for physical simulations, say the n-body gravitational experiment. I have been corrupted by c++, so you’ll have to excuse the syntax here.

    body bodyarray[n];
    //set initial positions
    while(time < tmax){
    //stuff
    in_parallel(moveprocess(bodyarray))
    //stuff
    }

    Maybe:

    (νbody)( (!(body(x).move(x).0))
    | (*body(delimit(bodyarray)).0) )

    Or something?

  7. #7 Mark C. Chu-Carroll
    March 22, 2007

    Xanthir:

    My email address is markcc@gmail.com (I get so much spam that just putting it in plaintext like this isn’t going to matter.)

    It’s also up on the “contact” tab at the top of the web page.

  8. #8 Don
    March 22, 2007

    What’s the difference between concurrent, distributed and parallel programming? Seems like these are all synonyms, or, at least, if a language supported one, it could support them all.

  9. #9 Mark C. Chu-Carroll
    March 22, 2007

    Falcon:

    There are two main differences between the Actor model and the π-calculus.

    (1) The π-calculus is based on synchronous rendezvous communication: communication occurs when two processes execute matching send/receive pairs. Actors is totally asynchronous: a sender sends a message, and immediately goes on to its next step, without waiting for the receiver to “accept” it. The message just gets dropped into a queue (the actor’s mailbox), and the receiver will grab the message from its mailbox at some point.

    (2) In π-calculus, communication occurs on named channels. The channel name is not a process name. You can have multiple processes listening on a single channel. When you send a message, you can specify what channel you want the message sent to, but you can’t specify which process will receive it. In actors, communication occurs between *specific* actors – you always say exactly who you want to send the message to.

  10. #10 Mark C. Chu-Carroll
    March 22, 2007

    Koray:

    When I said Haskell isn’t good at parallelism, I was being a bit imprecise.

    When Haskell first came out, there was a lot of hope for it being better for automatic parallelization than imperative languages. Since you didn’t have any non-explicit data dependencies – or any of the things that have traditionally messed up automatic parallelization by compilers, lots of us had high hopes that once the compilation strategies got worked out, Haskell would be *the* killer language for parallel mathematical code.

    What we ended up discovering over time was that laziness is actually not particularly friendly towards parallelization. Many of the things that end up blocking automatic parallelization still come up in Haskell, just in a different form. But in the end, Haskell is a very hard language for a compiler to parallelize.

    There has been work in adding constructs to Haskell to support writing explicitly parallel code. I haven’t looked at them, so I don’t have an opinion either way – but I’ve heard some very positive comments about STM from people who’s opinions I respect greatly.

  11. #11 Mark C. Chu-Carroll
    March 22, 2007

    omf:

    I love your comment about simulation :-). The work that I was doing with π-calculus in grad school was part of a project to do exactly what you’re talking about!

    I was looking at an approach developed by a guy named John Case to representing space as a tetrahedral lattice, and having objects represented by processes that moved around the grid. When an object would move in space, in the simulation, the process would move around the lattice.

    I still think it’s a cool idea.

  12. #12 Mark C. Chu-Carroll
    March 22, 2007

    Don:

    Great question. I should probably write a top-level post about that!

    The short version:

    – Parallelism means taking a complex computation, and finding ways to run different parts of it simultaneously on different processors/machines in order to make it run faster.

    – Concurrency is close to parallelism, except that it generally means having two different things running at the same time – rather than having on program broken into parts that run at the same time, concurrency usually means having two different things running at the same time. It can be a subtle difference. I’ll elaborate more when I expand this into a top-level post.

    – Distribution is having a program/system that spans multiple physical machines. It doesn’t have to be parallel – but ususally a distributed system does have things happening in parallel. But the key distinguishing factor is that you have multiple machines scattered around a network, and via some kind of communication mechanism, they’re working together.

    For example, I read an article a while back in a cheesy Java developers magazine about one of the electronic trading forms and how they worked. They had a java program that ran in the browser on the clients computer. When the client placed an order, that sent a message to the web server at the brokerage. The brokerage web-server would take that message, and figure out which of a collection of back-end database servers should handle the transaction, and then send a message to the appropriate back-end to process it; then it would send a message back to the client telling it that that transaction was in process. When the server completed the transaction, it would send a message back to the web-server, with detailed information about the transaction results, which would then translate that into HTML, and send it back to the client. So this system had three groups of machines – the client desktops, the web-servers, and the back-end transaction processors. Getting anything done involves sending messages back and forth between the different machines, which are each running different software. That’s a classic example of a distributed system.

  13. #13 Anonymous
    March 22, 2007

    Hi Mark,

    Great idea for an article, I’ll look forward to the rest of the series. Regarding this quote about Haskell:

    What we ended up discovering over time was that laziness is actually not particularly friendly towards parallelization. Many of the things that end up blocking automatic parallelization still come up in Haskell, just in a different form. But in the end, Haskell is a very hard language for a compiler to parallelize.

    I’d like to learn more about these problems. Can you point me at some papers I can read?

    Thanks
    Neil

  14. #14 Neil
    March 22, 2007

    I like Peyton Jones’ definition of parallelism vs concurrency in his paper Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell:

    A parallel functional program uses multiple processors to gain performance. For example, it may be faster to evaluate e1 + e2 by evaluating e1 and e2 in parallel, and then add the results. Parallelism has no semantic impact at all: the meaning of a program is unchanged whether it is executed sequentially or in parallel. Furthermore, the results are deterministic; there is no possibility that a parallel program will give one result in one run and a different result in a different run.

    In contrast, a concurrent program has concurrency as part of its specification. The program must run concurrent threads, each of which can independently perform input/output. The program may be run on many processors, or on one — that is an implementation choice. The behaviour of the program is, necessarily and by design, non-deterministic. Hence, unlike parallelism, concurrency has a substantial semantic impact.

  15. #16 Mark C. Chu-Carroll
    March 22, 2007

    Neil:

    One of the things about Haskell that isn’t *really* about Haskell that I love is the writing skill of some of the leading figures of the community.

    Peyton-Jones is an extremely good writer; Wadler is even better. Wadler is responsible for some of my favorite pieces of technical writing. The guy is amazing! Just for a simple example that’s easy to remember: the first paper on linear logic that I read was by Phil Wadler. The first sentence was “This paper has two purposes: to show that you can’t possible create a replacement for linear logic, and then to create one.”

  16. #17 Mark C. Chu-Carroll
    March 22, 2007

    Michaelw:

    Yes, I know about PICT. I didn’t mean to suggest that I’m the first person to ever try to build a language around π-calculus – there have been a bunch. But I’ve wanted an excuse to spend some time implementing my own language for a while. And the work that I’ve done over the last couple of years before leaving my old job just reinforced my belief that the languages we’re using today really suck when you need to implement a distributed system, so I’d like to see what I can do in that area.

    I fully expect it to be a total flop :-) But it will hopefully be a fun, education flop.

  17. #18 p
    March 22, 2007

    Peyton-Jones is an extremely good writer; Wadler is even better. Wadler is responsible for some of my favorite pieces of technical writing. The guy is amazing!

    This is totally true! I also love Jeremy Gibbon’s and Richard Bird’s writing.

  18. #19 Craig
    March 22, 2007

    I’m having trouble understanding the last reduction rule. I look forward to the rest of this series.

  19. #20 Brent Yorgey
    March 22, 2007

    Hi Mark,

    This is cool — I look forward to reading more about your language as it develops. I’ve encountered pi-calculus a couple times before but I think this is the first time I feel like it makes sense to me on an intuitive level. One question, though: I don’t understand ‘If “(P→Q)” then “(P+R)→Q”.’ I don’t see why this should be true. If all we know is that P→Q, then wouldn’t (P+R)→Q only if (P+R) transforms to (chooses) P? If P+R chooses R then it wouldn’t necessarily reduce to Q. Am I missing something?

  20. #21 Mark C. Chu-Carroll
    March 22, 2007

    Brent, Craig:

    Try reading “→” as “can reduce to in one step”, and things should become clearer. Each step of execution in a π-calculus computation consists of selecting one *possible* reduction and performing it.

    Read that way, it should be clearer what the second rule means. If you’ve got a choice (P+Q), and there’s a possible reduction P→R, then (P+Q) can be reduced to R by selecting choice P.

  21. #22 ObsessiveMathsFreak
    March 22, 2007

    I think I grok this a little more now.
    For the above example, with the n-body problem, you could do the following (I think)

    (νbody)( (!(body(x).move(x).0))
    | (*bodyarray(bodies).0) )

    (νbodyarray)(
    (!(bodyarray(x).split(x,first,rest).(body(first)|bodyarray(rest)).0)

    I think it would be best to have a distinction between messages and functions in your language. Say maybe having > and < denote messages. like body for an incoming message. So you’d write.

    (νbody)( (!(>body(x).move(x).0))
    | (

    And it would be understood that body and bodyarray are “i/o”, and move is simply a procedural function, rather than a process.

    What to you intend to do to finally implement your “base cases”? For the language to be useful, I think it would be a good idea to allow the programmer to inject procedural processes at a higher level, rather than force everything to be reduced via recursion, a la prolog.

  22. #23 Mark C. Chu-Carroll
    March 22, 2007

    OMF:

    Who said anything about differentiating messaging and functions? :-)

    Serious – I haven’t really decided about that yet. I’ve seen one language (Rob Strom’s Hermes) that totally eliminated the line between a procedure call and a process invocation in the language semantics. Transforming a process invocation into a local procedure call was just a matter of optimization.

    In some ways, that sounds silly until you actually try using it. Having used Hermes, I fell in love with the idea. Because what it meant in terms of programming was that the distinction between what was local and what was remote wasn’t in the language – it was in the execution configuration. So you could decide to shift the boundary between components running on different machines without altering a single line of code!

    As for the recursion thing… Again, I’m not sure, but it seems to me that the most natural thing for a π-language is to have something lambda-calculus like – so much about them is similar. So that would tend to leave me leaning towards the lambda-like constructs, which means lots of recursion.

  23. #24 Torbjörn Larsson
    March 22, 2007

    I second the “very cool” opinion. And I take it that the complete description will be less than 2k + pages. ;-)

  24. #25 Brent
    March 22, 2007

    Oh, I get it now. I guess my problem was thinking of “→” as “necessarily reduces to” rather than “can reduce to”.

  25. #26 Joshua
    April 3, 2007

    Oh, cool. For some reason, this makes way more sense to me than lambda-calculus.

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.