The Go I Forgot: Concurrency and Go-Routines

A couple of people pointed out that in my wednesday post about Go, I
completely left out the concurrency stuff! That's what I get for rushing the
post - I managed to leave out one of the most interesting subjects! Go
provides very strong support for communicating processes.

I haven't done a lot of hacking with the concurrency stuff yet - so my
impressions of it are still very preliminary. But my early impressions are
very good.

A lot of people have been talking about Go as a language for distributed
computation. I don't think that's correct: Go programs run in a single, shared
address space. You can have multiple concurrent threads; if you have multiple
CPUs, those threads can be run in parallel. But it's only concurrency within a
single OS process and a single address space. To me at least, distributed
means that a program runs on multiple machines. Go is pretty good
for writing distributed programs, but that's not what the concurrency
support in Go is; What it is is really solid support for
coroutine-style threading, with explicit communication between the threads.

The flavor of the communication system is very π-calculus-like. You
can, at any time, start a new thread by invoking a function: just prefix the
invocation with the word "go". So "f(x)" is a function call;
"go f(x)" is an invocation of a goroutine which is,
which runs concurrently with the code that called it.

Once you've created a go-routine, you can only talk to it through
channels> - you don't have any handle or identifier for the
executing goroutine, unless you create one by using a channel.
A channel is very much like the π-calculus concept of a
channel name: it's a first-class value which can be passed around. Channels
support two operations: output, and input. Any channel operation blocks until
a matching operation is executed by a different goroutine. So writing to a
channel blocks until someone reads it; reading from a channel blocks until
someone writes something for you to read. The channels are strongly
typed - you can only pass a single type of value over a channel.

So, for example, to write a program where you launch a thread to
do a computation, and then read the result when it's done, you could
do:

func ComputeAValue(c chan float64) {
   // Do the computation.
   x := 2.3423 / 534.23423;
   c <- x;
}

func UseAGoroutine() {
  channel := make(chan float64);
  go ComputeAValue(channel);
  // do something else for a while
  value := <- channel;
  fmt.Printf("Result was: %v", value);
}

It's very simple, and very convenient. And since Go has anonymous
functions, you could also write it as:

func UseAGoroutine() {
  channel := make(chan float64);
  go func(c chan float64) {
    // Do the computation.
    x := 2.3423 / 534.23423;
    c <- x;
  }(channel);
  // do something else for a while
  value := <- channel;
  fmt.Printf("Result was: %v", value); 
}

Even better, Go has lexical scope - so you don't actually
need to pass the channel:

func UseAGoroutine() {
  channel := make(chan float64);
  go func() {
    // Do the computation.
    x := 2.3423 / 534.23423;
    c <- x;
  }();
  // do something else for a while
  value := <- channel;
  fmt.Printf("Result was: %v", value); 
}

Goroutines all run in the same thread, so you can also do implicit communication
via shared variables. The standard libraries include things like mutexes
so that you can implement other forms of communication and sharing between goroutines,
but communication via channels is intended to be the main way that gorountines
interact.

In the implementation, the goroutines are very cheap and lightweight,
and they're multiplexed onto OS threads. So you don't need to worry
(much) about creating lots of goroutines - they'll get mapped onto
a manageable set of OS threads. This allows you to do some really neat
things where you create tons of goroutines. An example that I really like
is in the Go tutorial: it's a go-routine based version of the sieve
algorithm for computing primes.

In the sieve, you start with the set of all integers greater
than or equal to 2. 2 is the first prime - so you go through
the rest of the list, and get rid of all multiples of 2. Now, the
first number remaining in the list is the next prime: 3. So you
go through and strike out all multiples of 3. Then the first element
of the list is the next prime, 5. And so on.

This translates pretty naturally into a collection of concurrent filters.
Each filter removes multiples of one prime. You just pump numbers
through the filter in sequence. Anything that passes the filter for 2 gets
sent to the filter for three. Anything that passes the filter for three
gets sent to the filter for 5. Anything that gets past the last filter
is the next prime - so you add another filter to the chain, and return
the new prime.

To write that in Go, you start with a go-routine that generates
the sequence of integers:

func generate_integers() chan int {
    ch := make(chan int);
    go func(){
        for i := 2; ; i++ {
            ch <- i;
        }
     }();
    return ch;
}

That's very simple: just create a channel, and then start
a goroutine that loops through the integers greater than or
equal to to, writing them to the channel. It returns the channel
that you can read the numbers from.

Next, we need to write a filter that removes all multiples of some number:

func filter_multiples(in chan int, prime int) chan int {
   out := make(chan int);
   go func() {
      for {
         if i := <- in; i % prime != 0 {
             out <- i;
         }
      }
    }(); 
   return out; 
}

That creates a channel, which will be used to emit the values that pass
the filter. Then it starts a goroutine which actually runs the filter: it
reads values from the input channel, and then tests them to see if they're a
multiple of the filter prime. If they're not, it writes them to the output
channel. (You can see another bit of minimalism there; all loops in Go are
written using for.)

Now for the clever part: stringing the go-routines together.

func sieve() chan int {
   out := make(chan int);
   go func() {
      ch := generate_integers();
      for {
     prime := <- ch;
     out <- prime;
     ch = filter_multiples(ch, prime);
      }
   }();
   return out;
}

So - we create an initial output channel - that's the channel from
which we'll be able to read the primes. Then we start running the sieve
in a goroutine. The goroutine starts the integer sequence
generator. Then it runs a loop. It grabs the next number in sequence,
which is the next prime. It writes that to the output channel. Then it
creates a new filter, which filters that one. It takes the output from
the previous filter, and chains it to be the input to the next one. So
you wind up building a chain of filters. Each time you get a new prime,
it creates a new filter with its own goroutine, and adds it to the chain.

Then you can easily print out the primes:

func main() {
  primes := sieve();
  for {
    fmt.Println(<-primes);
  }
}

That's obviously a toy example - but it's a very cool toy
example.

More real examples of where this could be useful? Well, in my
not-very-abundant free time, I've been working on a text editor. I'm building
it in a very decoupled way - there's a backend which has an interface that's
almost like a little programming language based on regular expressions, and a
front-end that basically translates keystrokes and UI actions into commands.
In Go, it's a pair of goroutines. The UI sends commands to the backend; the
backend sends UI updates to the front-end. It's all done with a single
pair of channels. I came up with that basic design before I knew about Go;
I started the original implementation in Java using threads, queues,
and locks to build the communication between the components. In Go, it's
much simpler, and much cleaner. And I can add other things into the
process just by chaining together channels.

There is, of course, a lot more to goroutines and channels than what I've
written here. The type system for channels is pretty nice: you can distinguish
between generic channels (with which you can do whatever you want), read-only
channels, and write-only channels. You can use a switch-like
select block to allow you to do things like wait for an input
from any of a group of channels. You can set buffer sizes.

Why a whole post about this? Because, these days, we're all using
multi-core computers. For the most part, the way that we're making things
faster isn't really by making them faster: we've been running into
some physical barriers that make it hard to keep pushing raw CPU speed.
Instead, we've been adding CPUs: my laptop has 2 processors, and my server has
8 processors. We're getting things done faster by dividing the work up into
pieces, and doing those pieces simultaneously. But most of our programming
languages really suck at describing concurrency.

Goroutines and channels provide the best support I've seen outside
of Erlang for making use of concurrency. And frankly, I think Go is
a lot less ugly than Erlang. (Sorry Erlang fans, but I really
don't like Erlong.) Compared to Java, which I think is the main competitor
to Go in this area, Go's goroutines and channels are just so much easier
to work with than Java threads and locks, there's just absolutely
no comparison at all. Go pretty much destroys the competition
in this area.

Of course, some of my general complaints about Go carry over. Channels are
basically parametric types. But I can't write my own parametric channels. In
other words, I can't write a piece of code that does something like create
generic software-transactional memory, and then let people create a cell of
STM that specifically stores integers. I can create an STM cell which stores
values - but I can't do much to say what those values are.

On the other hand, channels do make it easy to do good exception handling
of concurrent code. You add a parameter to the goroutine function which is a
channel that will be used to signal errors. If anything goes wrong in the
goroutine, you have it send an error report to that channel. That's actually
about as good as any thread-based exception handling system I've ever seen.

Categories

More like this

Any channel operation blocks until a matching operation is executed by a different goroutine. So writing to a channel blocks until someone reads it; reading from a channel blocks until someone writes something for you to read.

I'm a little confused about this. Suppose I've got a UI thread and a job thread, and I want the UI to show a spinning hourglass until the job thread finishes computing and returns its result. How do I check to see if the job thread has finished without blocking my UI thread?

I suppose you could create another goroutine that blocks on the job finishing and updates the mouse cursor.

If you're working with some event-driven GUI in which certain GUI operations can only be performed from the main thread/goroutine, then I suppose your GUI library would use an event channel (instead of an event queue). In that case, it is only a matter of writing an appropriate event into the event channel once the job has finished. (I am assuming here that a channel can be written to from different goroutines.)

By Nicolai Hähnle (not verified) on 13 Nov 2009 #permalink

Not that I've run it, but for the example after talking about lexical scope, I think the line "c <- x;" should be "channel <- x;", right?

(Ugh, preview mangles the &xx; html codes in the comment box!)

I think you need to change the name of the channel in the lexical scope version of UseAGoRoutine to match the name inside of func.

Nipicking aside, this is a great introduction to a very interesting language. Thanks.

Surely in your "Even better, Go has lexical scope" code block, you mean "channel <- x" and not "c <- x".

Great Go write-ups. I'm very interested and learning a lot!

On the other hand, channels do make it easy to do good exception handling of concurrent code. You add a parameter to the goroutine function which is a channel that will be used to signal errors. If anything goes wrong in the goroutine, you have it send an error report to that channel. That's actually about as good as any thread-based exception handling system I've ever seen.

Does Go have some way to wait on multiple channels? It seems like if you have one channel for the result and another for the error and you can only wait on one channel if the other channel gets written to then the thread that started the goroutine will never wake up.

You might like Scala's actor system, it is very clean as well. Not only that but it runs in the JVM, so you can still use your favourite Java libraries.

You write "So writing to a channel blocks until someone reads it; reading from a channel blocks until someone writes something for you to read."

Somehow that seems INCREDIBLY limiting. What am I missing here? From that primitive, how does one build a thread which works on the first available from several different queues? Or like Brett pointed out, how do you wait on EITHER the work OR the exception channel? Or how, from only a non-blocking primitive, do you build something which performs a read with a timeout?

The only way I can imagine is for each of these cases to have a separate thread whose sole purpose is to multiplex the incoming requests and pass them on to the receiving thread. (And I'm not sure that would allow you to build the timout or the select.) Am I overlooking or misunderstanding something?

@How do I check to see if the job thread has finished without blocking my UI thread?

There's a select statement, sort of like a switch in C.

By Mark James (not verified) on 13 Nov 2009 #permalink

@10 and others: "You can use a switch-like select block to allow you to do things like wait for an input from any of a group of channels."

In primes example, it seems like you can make channel behave like a lazy evaluated list in Haskell. How far can this parallel be pushed?

By Slartibartfast (not verified) on 13 Nov 2009 #permalink

Nice stuff, amusingly enough your parallel sieve example has recently been treated in a blog article by Neil Brown with the Haskell CHP (Communicating Haskell Process) library.

Still the lack of user generics (or parametric polymorphism) is an ugly wart...

How does the gorutines handle functions being closures? It would seem to have potential for causing a lot of problems with multiple gorutines accessing the same variables (from parent scopes).

I'm a bit surprised at all the buzz this is getting *now*. It looks to me like go's concurrency model was lifted lock, stock, and barrel from newsqueak, also developed by Pike--back in 1989. (See Pike's google tech-talk on it from 2007.)

Any channel operation blocks until a matching operation is executed by a different goroutine. So writing to a channel blocks until someone reads it; reading from a channel blocks until someone writes something for you to read.

I think this is only true if you use an unbuffered channel, but you can set the buffer for a channel with e.g.
out := make(chan int, 20);
This lets you write asynchronously to a channel (if I understood it correctly, haven't experimented with it yet). See here for more details.

By markmuetz (not verified) on 13 Nov 2009 #permalink

I've made a Java version of your primes example that doesn't need locks: http://pastie.org/698038

What do you think about it?

By Rörd Hinrichsen (not verified) on 13 Nov 2009 #permalink

Rörd: 41 lines of clean and straight-forward Go code vs. 85 lines of names, boilerplate, try-catch blocks, @annotations. I think the comparison is ... illustrative.

Ben: Yes, I've noticed that too. I'm not a big fan of Java either, but I did have a faint memory that the Java library has something similar to channels and I wanted to check this memory.

PS, here's an even shorter implementation of that algorithm (though without threads):

sieve (x:xs) = x : sieve [y | y <- xs , y `mod` x > 0]

primes = sieve [2..]

By Rörd Hinrichsen (not verified) on 13 Nov 2009 #permalink

I should've looked closer at the preview, I hope it'll work now.

sieve (x:xs) = x : sieve [y | y <- xs , y `mod` x > 0]

primes = sieve [2..]

By Rörd Hinrichsen (not verified) on 13 Nov 2009 #permalink

On one hand, Go provides us with things like

stepResults := vector.New(0);

on the other hand, we have to write

ch := make(chan int);

How is this New/make dichotomy minimal? Reminds me of Python's early idiosyncrasies and makes me glad to be a Ruby enthusiast. And what's with all those colons and semicolons? (Tree-less one-pass parser, I guessâ¦) (-;

Your example goroutines all have this form:
func some_func() chan foo {
ch := make(chan foo);
go func() {
// yadda yadda yadda
// put something in ch
}();
return ch;
}

If that is idiomatic Go, it would be nice to rewrite it like this:
func some_func() chan foo {
return go {
// yadda yadda yadda
// put something in _
}
}

By Robert Israel (not verified) on 13 Nov 2009 #permalink

Some of the strong contenders in the concurrent programming domain are Erlang, Haskell and Clojure, and they all give a lot of credit of the success of their concurrent capabilities to immutable data structures. How does Go's imperative nature fit in the whole concurrency scheme?

We've already been doing this exact same stuff in C for decades, albiet at a slightly lower level, with pipe() and fork(). :-)

A lot of this reminds me of Lua, which has less creation/instantiation cruft but no strong typing. I'm especially reminded of it given the coroutine features they both share, though Go adds syntax for it while Lua handles them with library calls. When it comes to being a low-level high-level language, they seem analogous, with Go more suited to systems programming and Lua more suited to use as a glue language. But overall, the more I look at Go, the more I think I'm seeing Lua, repackaged.

http://www.lua.org/

Your text editor reminds me of sam.

By Chris Neumueller (not verified) on 14 Nov 2009 #permalink

My mental parser indicates that your 3rd code snippet should use "channel" instead of "c" inside anonymous function.

Speaking of channels, they seem like a mechanism, which would make distributed implementation of Go relatively straightforward (similar to MPI?).

By Paul Jurczak (not verified) on 14 Nov 2009 #permalink

@29:

And well it should; what I'm trying to do is write something that's sort-of in-between Sam and Acme. I want the command-execution behaviors of Acme, with the basic structure of Sam. I don't want the window-management parts of Acme - I think that window management is best left to a window manager. With xmonad or awesome or wmii or ion, I can get better tiled window behavior that I'd be able to implement myself in any reasonable amount of time.

How is this New/make dichotomy minimal? Reminds me of Python's early idiosyncrasies and makes me glad to be a Ruby enthusiast.

well, Python did get better. but i'm sure you also remember how long it took.

what upsets me is the same perception of hypocrisy on the designers' part as Mark mentioned upset him; that notion of "we'll use this when building the language, but you don't get to touch it when using the language" is tremendously grating. and yes, the curly braces and semicolons do make it ugly. no sane programmer will ever fail to use whitespace to render their code readable; no sensible reason to fail to make use of that in the parser too.

finally, designing a language for the specific goal of great speed of compilation might be worthwhile, and such a language might find itself a niche. but i'm pretty sure that if given a choice, i'd rather be writing my own programs in a domain where speed of execution (saving the users' time) matters more than making the development process convenient for the programmer. if that means a much slower compiler to better optimize the binary code, i'm all for building that better, smarter, slower compiler.

By Nomen Nescio (not verified) on 14 Nov 2009 #permalink

#32:

Talking about how you'd rather have a slower compiler shows that you're not working in the same kind of environment as the Go team. At Google, we spend a lot of time on compilation. We've had teams of people (including me during my first year at Google) spending years of work on finding ways to reduce build times.

A large C++ system with a complex component dependency graph can easily take 20 minutes to compile. If you sit down and profile and analyze the time it's spending, you find some very interesting and surprising things. For example, compiler authors often spend a lot of time worrying about how much time their optimizers take. But in a large C++ system, the time spent optimizing and generating code is actually not a significant factor! What *is* significant (amazingly) is parsing. But, anyone familiar with algorithms should say, parsing *should* be trivial! The catch is that you're parsing the same thing, over and over and over again. Every time you compile a C++ source file, you need to parse all of its headers. Over, and over, and over again. Back in my C++ compiler days at IBM, I saw real applications that parsed the stdio header file 3000 times. Even if it takes 1/100th of a second, that's 3 seconds of compile time on a single header file!

To a great extent, that's C++ stupidity. i don't know of *any* other languages that do anything as utterly stupid as the C++ header include thing. But lots of languages do end up with properties that complicate compilation. Language designers really should be aware of what impact various language features will have on compilation times of real
programs.

Talking about how you'd rather have a slower compiler shows that you're not working in the same kind of environment as the Go team.

indeed i'm not, and that's what my one-liner about how Go might well find a niche was meant to indicate. maybe i should have stressed that more; i do realize that there may very well be applications for such a language (why else would somebody have gone to the trouble of expressly designing one?), it's just that i would much prefer not to work in such domains. merely personal preference, though.

i'll definitely be following along in your reworking of those old Haskell posts, though. i've been wanting to learn a type-inferred language, and Haskell has good things said of it. Go seems like it might be a little too low-level for my own tastes, though at least it has memory management, i guess.

By Nomen Nescio (not verified) on 14 Nov 2009 #permalink

Mark, have you checked out Microsoft's Task Parallel Library? http://msdn.microsoft.com/en-us/library/dd460717(VS.100).aspx

BlockingCollection seems to have pretty much the same semantics as a channel.

TaskFactory.StartNew seems to have the same semantics as the 'go' keyword.

Of course, there's also a lot more features (Parallel.Do, Parallel.For, futures, concurrent collections, etc) which can be good or bad depending on how you look at it. :)

In a couple of minutes, I wrote and ran a Python version of your toy example, mostly from your text description rather than the code.

def plurals():
i = 2
while True:
yield i
i += 1

def primefilter(src, prime):
for i in src:
if i % prime:
yield i

src = plurals()
while True:
i = next(src)
print(i)
src = primefilter(src, i)

It stopped at 7877 when it hit the default recursion limit of 1000, which could easily be increased to get out-of-memory error instead.

I think Google is making a blunder if it moves to another old-fashioned language whose code is bloated with junky boilerplate that doubles the size. It would be much better, for instance, to tweak Python, which it has had great success with, to better run on multiple cores.

By Terry Jan Reedy (not verified) on 14 Nov 2009 #permalink

> So writing to a channel blocks until someone reads it; reading from a channel blocks until someone writes something for you to read.

Um, no. There are buffered channels that don't block unless the buffer size is exceeded.

By Bob Foster (not verified) on 14 Nov 2009 #permalink

You say: "Even better, Go has lexical scope" and then provide example code; however, the example code uses the name "c" for "channel", like in the previous example where "c" was the parameter name; in the enclosing scope there is only "channel".

Quoting the article

"Goroutines all run in the same thread, so you can also do implicit communication via shared variables"

and

"goroutines are very cheap and lightweight, and they're multiplexed onto OS threads"

Mmmh... so, which is it?

if i understand threads correctly, Cedric, both can be true. threads within a single O/S process share address space, i believe, so can communicate via shared variables --- [cg]oroutines within one thread certainly could, then.

why you'd want to multiplex N goroutines onto M threads (for N > M) i don't really know, but presumably there's some reason. i'd think if you wanted to get maximal utility out of multiple processors (or cores) you'd want to use multiple processes, likely not just threads. then again, this is all far from my area of expertise.

(geez, i feel old. i still remember when talk on the linux-kernel list was all, "processes are too heavy, they take too long to spawn, we need threads like windoze has got" and the eventual result was to both introduce pthreads and speed up fork() until it competed with MS' thread spawning. now even threads are considered too slow? kids these days, mumble grumble, when i was learning to code we sometimes ran out of ones...)

By Nomen Nescio (not verified) on 15 Nov 2009 #permalink

>The catch is that you're parsing the same thing, over and over and over again

Ah, haven't we had pre-compiled header support now for 15 years?

@Rörd: The Haskell Sieve is not a good example, it has the wrong complexity. See Melissa E. OâNeill, The Genuine Sieve of Eratosthenes for a full and very nice discussion. On the other hand, the Go version falls into the same trap (crossing out numbers in a table vs. testing numbers in a sequence).

Norman Rescio
> why you'd want to multiplex N goroutines onto M threads (for
> N > M) i don't really know, but presumably there's some reason

Threads are quite expensive in the amount of memory they allocate, while goroutines (or light-weight-threads, or erlang processes, or fibers, or whatever is the name in context X) typically allocate very little memory initially. Like Erlang, it appears that a Go-program can run hundreds of thousands of goroutines on a single machine, which is not possible to do with standard threads. When it is that cheap to put computation in a separate goroutine, it enables a different style of programming.

@36: your code completely misses the point, it doesn't take advantage of multiple processors.

Somehow that seems INCREDIBLY limiting. What am I missing here? From that primitive, how does one build a thread which works on the first available from several different queues?

MarkCC mentioned some kind of select statement that would let you get input from any of a number of different channels...

I also have questions about how you would manage things like a limited-size readahead buffer, but Mark alluded to a lot more syntax not shown here.

Fascinating shit. Too bad I'll probably never have an opportunity to really do anything w ith it... :/

I agree, but I'm not so sure about what you said at the beginning. Where are you getting your information? I'm not disagreeing, but I'm just wondering how you came to that conclusion.

Justin Davis
Author does not represent the legal position of the Darpa Challenge 2009 and expresses opinion only.

By Justin Davis (not verified) on 22 Nov 2009 #permalink

Go's Channels are Datachannels from Functional Reactive Programming. They are powerful and have some nice properties about them, but they are not particularly original.

As a side note, that implementation is not a Sieve of Eratosthenes. To understand why, one just have to notice that the filter for 5 will be applied to 7, 11, 13, 17 and 19, while in the true sieve it will be applied to 10, 15 and 20. As the prime numbers get bigger, the difference gets exarcebated.