An Experiment with π-calculus and Programming Language Design

I feel like a bit of a change of pace, and trying a bit of an experiment.

Re-reading Backus's old FP work reminds me of what I was doing the last time I read it, which
was back in grad school. At the time, I was working on some stuff involving parallel computation,
and I discovered Robin Milner's π-calculus as a tool for describing parallel computation. You
can think of π-calculus as being a sort of parallel (pun intended) for the λ-calculus: in
sequential, single-threaded computation, λ-calculus can be a great tool for describing what
things mean. But λ-calculus has absolutely no way of describing the concept of multiple
things happening at once. π-calculus is fundamentally built on the concept of multiple threads
which can only share information by passing messages.

There's a reason that reading Backus made me think of this, beyond just the temporal coincendence of the fact that I was working with π-calculus the last time I read Backus's
FP paper. One of the major points that Backus made was how poorly the vonNeumann model was
at describing many computations; that has become far more true in recent years. Even my laptop now has multiple processors; computation just isn't single-threaded anymore. But most programming languages are basically deeply single-threaded. Even Haskell, for all of its functional
purity, isn't particularly good at multi-threaded execution. But I've always thought it would be
a great idea to build a language around π-calculus the way that ML is built around λ-calculus.

So my experiment, such as it is, is to see if I can work through how to create an actual, useful, non-pathological programming language based on the π-calculus; and not just do that,
but do it publicly, here on the blog.

The obvious starting place is with a quick overview of what π-calculus is. Today, I'llgive you a quick overview of it; and then in another post, I'll dive into a bit of semantics - but only as much semantics as is really relevant to building a language out of it.

π-calculus is really remarkably simple. The idea is that you describe a set of
processes using the syntax of the calculus, and then there are a set of reduction rules
that describe how the computation actually executes. Syntactically, there are really only six
constructs in λ-calculus; and some of those can be viewed as "compounds". The basic
constructs are:

  1. Parallel composition: if P and Q are both processes then "P|Q" is a process which runs "P" and "Q" in parallel.
  2. Non-deterministic choice: if P and Q are both processes, then "P+Q" is a process
    which will be non-deterministically transformed into either "P" or "Q", but not both.
  3. Receive a message. If "P" is a process, and "c" is the name of a communication channel, and
    "x" is an identifier, then then "c(x).P" waits for a message to be received
    on channel "c", binds the contents of that message to the identifier "x",
    and then becomes process P. This is very much like the π-calculus version of
    a λabstraction.
  4. Send a message: If "P" is a process, "x" is a value, and "c" is the name of a communication channel, then "*c(x).P" sends the value "x" as a message on channel "c" and then proceeds with process "P". (In standard π-calculus syntax, this would be written a c with an overbar, but that's very awkward in HTML, so I'm using "*" as an alternative.)
  5. Replication: If "P" is a process, then "!P" is a process consisting of an arbitrary
    number of copies of P. "!P" is syntactically equivalent to "P|!P". You can think of "!P" as being something like an HTTP service: no matter
    how many times you send an HTTP request to a server, there's always an HTTP listener waiting
    to receive your request.
  6. Channel creation: If "P" is the name of a process, and "x" is an unbound identifier,
    then "(νx)P" creates a new communication channel bound to the identifier "x", and
    then proceeds with process P (where the new identifier is in scope.)
  7. Process termination/null process. "0" is an identifier for a null process. It's generally
    used to describe process termination: when a process P become 0, it halts.

Computations in π-calculus is described in terms of a fundamental reduction rule, →, which is read "reduces in one step" (think of → as the π-calculus version of β-reduction): "*x(z).P | x(y).Q→P|(Q[z/y])", where Q[z/y] means "Q with z replacing y".

To clarify the what this means in the context of the other constructs, we also usually say the following, although strictly speaking, they should be derivable from the syntax and the basic reduction rule:

  • If "P→Q" then "P|R→Q|R"
  • If "P→Q" then "(νx)P → (νx)Q".
  • If "(P→Q)" then "(P+R)→Q".

Suppose that "out" is a communication channel that prints to a standard output. Then
the following is a little "program" in π-calculus which prints hello world.

    (νin)( (!(in(x).*out(x).0))
            | (*in(hello).*in(world).0) )

We can walk through a quick reduction of that:

(νin)( (!(in(x).*out(x).0))
           | (*in(hello).*in(world).0) ) 
   = (νin)( (in(x).*out(x).0|!(in(x).*out(x).0))
           | (*in(hello).*in(world).0) )    [! expansion]
   → (νin)( *out(hello).0|!(in(x).*out(x).0))
           | (*in(world).0) )    [one → step on channel "in"]
   → (νin)( !(in(x).*out(x).0))
           | (*in(world).0) )    [one → step with the "invisible" standard out process step]
= (νin)( (in(x).*out(x).0|!(in(x).*out(x).0))
           | (*in(world).0) )    [Another "!" expansion]
   → (νin)( *out(world).0|!(in(x).*out(x).0))
           | (0) )    [→ on "in"]
   → (νin)(!(in(x).*out(x).0)) [→ with standard out process on "out".]

More like this

Now that things are settling down a little bit, I wanted to get back to the stuff I was writing about π-calculus. But looking back at what I've already written, I think I did a terrible job of introducing it. So I'm going to start over, and try to be more clear. I'll start with a basic refresher…
Given a calculus, one of the things that I always want to see is how it can do some kind of meaningful computation. The easiest way to do that is to start building numbers and basic arithmetic. To be able to do this, we're going to need one more bit of syntax. What we want to do is specify a…
As I did with my first attempt at explaining π-calculus, I think that before getting into any of the deep semantics, it's good to look at a few examples of things you can build with π-calculus. But before getting to the meat of the post, I'll give you the answer the puzzle I left in the last post…
Since my post on datatypes for my π-calculus language, I've gotten a bunch of questions from people who (I guess) picked up on the series after the original post where I said that the idea of the series was to see if I could create a programming language based on it. The questions are all…

Typo: "Syntactically, there are really only six constructs in λ-calculus;" I'm guessing you mean Ï-calculus.

Mark, this is completely unrelated, but is the email address on your homepage (found through google) still current? If not, where might I find your email address?

By Xanthir, FCD (not verified) on 21 Mar 2007 #permalink

Syntactically, there are really only six constructs in λ-calculus; and some of those can be viewed as "compounds" . Did you mean pi calculus there?

So what is the difference between pi calculus and the actors model? Erlang is supposed to be based on the actor model, how would it be different if it was based on pi calculus?

Does pi calculus also require something like monads to do IO, State, etc.?

Does pi calculus also have its counters in some logic (curry/howard isomorphism?)

Alice ML people have done a paper on implementing futures in lambda calculus...so perhaps that shows one kind of concurrency, although, I guess, the idea there is that the concurrency is determenistic and can be considered 'hidden' from a user's perspective.

Pierce did some work on a pi-calculus language a while ago: http://citeseer.ist.psu.edu/pierce97pict.html

I have this blog on my rss feed but the project you are discussing now make me even more interested :)

This is very cool.

Do you see any connection with:

Samson Abramsky's "What are the fundamental structures of concurrency? We still don't know!"

as discussed on March 5, 2007
at n-Category Cafe,
Computer Science and Physics
Posted by David Corfield

What do you mean Haskell isn't particularly good at parallelism? Despite Control.Concurrent.Chan and Control.Concurrent.STM?

The key to creating a good programming language (that people will actually use) will be to making the syntax i)intuative ii)readable iii)scalable.

Could you show how parallelised matrix multiplication can be represented by pi-calculus?

My mind is envisioning a language to ease parallelisation for physical simulations, say the n-body gravitational experiment. I have been corrupted by c++, so you'll have to excuse the syntax here.

body bodyarray[n];
//set initial positions
while(time < tmax){
//stuff
in_parallel(moveprocess(bodyarray))
//stuff
}

Maybe:

(νbody)( (!(body(x).move(x).0))
| (*body(delimit(bodyarray)).0) )

Or something?

By ObsessiveMathsFreak (not verified) on 22 Mar 2007 #permalink

Xanthir:

My email address is markcc@gmail.com (I get so much spam that just putting it in plaintext like this isn't going to matter.)

It's also up on the "contact" tab at the top of the web page.

What's the difference between concurrent, distributed and parallel programming? Seems like these are all synonyms, or, at least, if a language supported one, it could support them all.

Falcon:

There are two main differences between the Actor model and the π-calculus.

(1) The π-calculus is based on synchronous rendezvous communication: communication occurs when two processes execute matching send/receive pairs. Actors is totally asynchronous: a sender sends a message, and immediately goes on to its next step, without waiting for the receiver to "accept" it. The message just gets dropped into a queue (the actor's mailbox), and the receiver will grab the message from its mailbox at some point.

(2) In π-calculus, communication occurs on named channels. The channel name is not a process name. You can have multiple processes listening on a single channel. When you send a message, you can specify what channel you want the message sent to, but you can't specify which process will receive it. In actors, communication occurs between *specific* actors - you always say exactly who you want to send the message to.

Koray:

When I said Haskell isn't good at parallelism, I was being a bit imprecise.

When Haskell first came out, there was a lot of hope for it being better for automatic parallelization than imperative languages. Since you didn't have any non-explicit data dependencies - or any of the things that have traditionally messed up automatic parallelization by compilers, lots of us had high hopes that once the compilation strategies got worked out, Haskell would be *the* killer language for parallel mathematical code.

What we ended up discovering over time was that laziness is actually not particularly friendly towards parallelization. Many of the things that end up blocking automatic parallelization still come up in Haskell, just in a different form. But in the end, Haskell is a very hard language for a compiler to parallelize.

There has been work in adding constructs to Haskell to support writing explicitly parallel code. I haven't looked at them, so I don't have an opinion either way - but I've heard some very positive comments about STM from people who's opinions I respect greatly.

omf:

I love your comment about simulation :-). The work that I was doing with π-calculus in grad school was part of a project to do exactly what you're talking about!

I was looking at an approach developed by a guy named John Case to representing space as a tetrahedral lattice, and having objects represented by processes that moved around the grid. When an object would move in space, in the simulation, the process would move around the lattice.

I still think it's a cool idea.

Don:

Great question. I should probably write a top-level post about that!

The short version:

- Parallelism means taking a complex computation, and finding ways to run different parts of it simultaneously on different processors/machines in order to make it run faster.

- Concurrency is close to parallelism, except that it generally means having two different things running at the same time - rather than having on program broken into parts that run at the same time, concurrency usually means having two different things running at the same time. It can be a subtle difference. I'll elaborate more when I expand this into a top-level post.

- Distribution is having a program/system that spans multiple physical machines. It doesn't have to be parallel - but ususally a distributed system does have things happening in parallel. But the key distinguishing factor is that you have multiple machines scattered around a network, and via some kind of communication mechanism, they're working together.

For example, I read an article a while back in a cheesy Java developers magazine about one of the electronic trading forms and how they worked. They had a java program that ran in the browser on the clients computer. When the client placed an order, that sent a message to the web server at the brokerage. The brokerage web-server would take that message, and figure out which of a collection of back-end database servers should handle the transaction, and then send a message to the appropriate back-end to process it; then it would send a message back to the client telling it that that transaction was in process. When the server completed the transaction, it would send a message back to the web-server, with detailed information about the transaction results, which would then translate that into HTML, and send it back to the client. So this system had three groups of machines - the client desktops, the web-servers, and the back-end transaction processors. Getting anything done involves sending messages back and forth between the different machines, which are each running different software. That's a classic example of a distributed system.

Hi Mark,

Great idea for an article, I'll look forward to the rest of the series. Regarding this quote about Haskell:

What we ended up discovering over time was that laziness is actually not particularly friendly towards parallelization. Many of the things that end up blocking automatic parallelization still come up in Haskell, just in a different form. But in the end, Haskell is a very hard language for a compiler to parallelize.

I'd like to learn more about these problems. Can you point me at some papers I can read?

Thanks
Neil

By Anonymous (not verified) on 22 Mar 2007 #permalink

I like Peyton Jones' definition of parallelism vs concurrency in his paper Tackling the awkward squad: monadic input/output, concurrency, exceptions, and foreign-language calls in Haskell:

A parallel functional program uses multiple processors to gain performance. For example, it may be faster to evaluate e1 + e2 by evaluating e1 and e2 in parallel, and then add the results. Parallelism has no semantic impact at all: the meaning of a program is unchanged whether it is executed sequentially or in parallel. Furthermore, the results are deterministic; there is no possibility that a parallel program will give one result in one run and a different result in a different run.

In contrast, a concurrent program has concurrency as part of its specification. The program must run concurrent threads, each of which can independently perform input/output. The program may be run on many processors, or on one -- that is an implementation choice. The behaviour of the program is, necessarily and by design, non-deterministic. Hence, unlike parallelism, concurrency has a substantial semantic impact.

Neil:

One of the things about Haskell that isn't *really* about Haskell that I love is the writing skill of some of the leading figures of the community.

Peyton-Jones is an extremely good writer; Wadler is even better. Wadler is responsible for some of my favorite pieces of technical writing. The guy is amazing! Just for a simple example that's easy to remember: the first paper on linear logic that I read was by Phil Wadler. The first sentence was "This paper has two purposes: to show that you can't possible create a replacement for linear logic, and then to create one."

Michaelw:

Yes, I know about PICT. I didn't mean to suggest that I'm the first person to ever try to build a language around π-calculus - there have been a bunch. But I've wanted an excuse to spend some time implementing my own language for a while. And the work that I've done over the last couple of years before leaving my old job just reinforced my belief that the languages we're using today really suck when you need to implement a distributed system, so I'd like to see what I can do in that area.

I fully expect it to be a total flop :-) But it will hopefully be a fun, education flop.

Peyton-Jones is an extremely good writer; Wadler is even better. Wadler is responsible for some of my favorite pieces of technical writing. The guy is amazing!

This is totally true! I also love Jeremy Gibbon's and Richard Bird's writing.

I'm having trouble understanding the last reduction rule. I look forward to the rest of this series.

Hi Mark,

This is cool -- I look forward to reading more about your language as it develops. I've encountered pi-calculus a couple times before but I think this is the first time I feel like it makes sense to me on an intuitive level. One question, though: I don't understand 'If "(PâQ)" then "(P+R)âQ".' I don't see why this should be true. If all we know is that PâQ, then wouldn't (P+R)âQ only if (P+R) transforms to (chooses) P? If P+R chooses R then it wouldn't necessarily reduce to Q. Am I missing something?

Brent, Craig:

Try reading "→" as "can reduce to in one step", and things should become clearer. Each step of execution in a π-calculus computation consists of selecting one *possible* reduction and performing it.

Read that way, it should be clearer what the second rule means. If you've got a choice (P+Q), and there's a possible reduction P→R, then (P+Q) can be reduced to R by selecting choice P.

I think I grok this a little more now.
For the above example, with the n-body problem, you could do the following (I think)

(νbody)( (!(body(x).move(x).0))
| (*bodyarray(bodies).0) )

(νbodyarray)(
(!(bodyarray(x).split(x,first,rest).(body(first)|bodyarray(rest)).0)

I think it would be best to have a distinction between messages and functions in your language. Say maybe having > and < denote messages. like body for an incoming message. So you'd write.

(νbody)( (!(>body(x).move(x).0))
| (

By ObsessiveMathsFreak (not verified) on 22 Mar 2007 #permalink

OMF:

Who said anything about differentiating messaging and functions? :-)

Serious - I haven't really decided about that yet. I've seen one language (Rob Strom's Hermes) that totally eliminated the line between a procedure call and a process invocation in the language semantics. Transforming a process invocation into a local procedure call was just a matter of optimization.

In some ways, that sounds silly until you actually try using it. Having used Hermes, I fell in love with the idea. Because what it meant in terms of programming was that the distinction between what was local and what was remote wasn't in the language - it was in the execution configuration. So you could decide to shift the boundary between components running on different machines without altering a single line of code!

As for the recursion thing... Again, I'm not sure, but it seems to me that the most natural thing for a π-language is to have something lambda-calculus like - so much about them is similar. So that would tend to leave me leaning towards the lambda-like constructs, which means lots of recursion.

I second the "very cool" opinion. And I take it that the complete description will be less than 2k + pages. ;-)

By Torbjörn Larsson (not verified) on 22 Mar 2007 #permalink

Oh, I get it now. I guess my problem was thinking of "â" as "necessarily reduces to" rather than "can reduce to".