Good Math, Bad Math

As promised, I’m finally going to get to the theory behind monads. As a quick
review, the basic idea of the monad in Haskell is a hidden transition function – a
monad is, basically, a state transition function.

The theory of monads comes from category theory. I’m going to assume you know
a little bit about category theory – if you have trouble with it, go take a look
at my introductory posts here.

Fundamentally, in category theory a monad is a category with a particular kind of
structure. It’s a category with one object. That category has a collection of
arrows which (obviously) are from the single object to itself. That one-object
category has a functor from the category to itself. (As a reminder, a functor
is an arrow between categories in the category of (small) categories.)

The first trick to the monad, in terms of theory, is that it’s
fundamentally about the functor: since the functor maps from a category to
the same category, so you can almost ignore the category; it’s implicit in
the definition of the functor. So we can almost treat the monad as if it were just
the functor – that is, a kind of transition function.

The other big trick is closely related to that. For the programming language
application of monads, we can think of the single object in the category as
the set of all possible states. So we have a category object, which is essentially
the collection of all possible states; and there are arrows between the states
representing possible state transitions. So the monad’s functor is really just a mapping from arrows to different arrows – which basically represents the way that
changing the state causes a change in the possible transitions to other states.

So what a monad gives us, in terms of category theory, in a conceptual framework that captures the concept of a state transition system, in terms of transition functions that invisibly carry a state. When that’s translated into
programming languages, that becomes a value that implicitly takes an
input state, possibly updates it, and returns an output state. Sound familiar?

Let’s take a moment and get formal. As usual for category theory, first there are
some preliminary definitions.

  1. Given a category, C, 1C is the identity functor from C to C.
  2. Given a category C with a functor T : CC,
    T2 = T º T.
  3. Given a functor T, 1T : TT is the natural transformation from T to T.

Now, with that out of the way, we can give the complete formal definition
of a monad. Given a category C, a monad on C is
a triple: (T:CC,
η:1CT,
μ:T2T), where T is
a functor, and η and μ are natural transformations. The members of the
triple must make the following two diagrams commute.


i-3708df09409a7a561b25a0597ce652f3-monad-prop1.jpg

Commutativity of composition with μ



i-7446dbf8a04d1331f612c90223413813-monad-prop2.jpg

Commutativity of composition with η

What these two diagrams mean is that successive applications of the
state-transition functor over C behave associatively – that any sequence of
composing monadic functors will result in a functor with full monadic structure; and
that the monadic structure will always preserve. Together, these mean that any sequence
of operations (that is, applications of the monad functor) are themselves monad
functors – so the building a sequence of monadic state transformers is guaranteed to
behave as a proper monadic state transition – so what happens inside of the
monadic functor is fine – to the rest of the universe, the difference between a
sequence and a single simple operation is indistinguishable: the state will be
consistently passed from application to application with the correct chaining behavior,
and to the outside world, the entire monadic chain looks like like a single atomic
monadic operation.

Now, what does this mean in terms of programming? Each element of a monadic
sequence in Haskell is an instantiation of the monadic functor – that is, it’s an
arrow between states – a function, not a simple value – which is the basic
trick to monads. They look like a sequence of statements; in fact, each
statement in a monad is actually a function from state to state. And it looks like
we’re writing sequence code – when what we’re actually doing is writing function
compositions
– so that when we’re done writing a monadic sequence, what we’ve actually done is written a function definition in terms of a sequence of function compositions.

Understanding that, we can now clearly understand why we need the return function to use a non-monad expression inside of a monadic sequence – because each step in the sequence needs to be an instance of the monadic functor; an expression that
isn’t an instance of the monadic functor couldn’t be composed with the functions in the sequence. The return function is really nothing but a function that
combines a non-monadic expression with the id functor.

In light of this, let’s go back and look at the definition of Monad in
the Haskell standard prelude.

class  Functor f  where
    fmap              :: (a -> b) -> f a -> f b

class  Monad m  where
    (>>=)  :: m a -> (a -> m b) -> m b
    (>>)   :: m a -> m b -> m b
    return :: a -> m a
    fail   :: String -> m a
	        -- Minimal complete definition:
       --      (>>=), return
    m >> k  =  m >>= \_ -> k
    fail s  = error s	

The declaration of monad is connected with the definition of functor – if you look,
you can see the connection. The fundamental operation of Monad is
>>=” – the chaining operation, which is type m a -> (a -> m b) -> m b is deeply connected with the fmap operation from Functor‘s fmap operation – the a in m a is generally going to be a type which can be a Functor.

So the value type wrapped in the monad is a functor – in fact, the functor
from the category definition! And the “>>=” operation is just the functor composition operation from the monad definition.

A proper implementation of a monad needs to follow some fundamental rules – the
rules are, basically, just Haskell translations of the structure-preserving rules about
functors and natural transformations in the category-theoretic monad. There are two
groups of laws – laws about the Functor class, which should hold for the
transition function wrapped in the monad class; and laws about the monadic operations
in the Monad class. One important thing to realize about the functor and
monad laws is that they are not enforced – in fact, cannot be
enforced! – but monad-based code using monad implementations that do not follow them
may not work correctly. (A compile-time method for correctly verifying the enforcement
of these rules can be shown to be equivalent to the halting problem.)

There are two simple laws for Functor, and it’s pretty obvious why
they’re fundamentally just strucure-preservation requirements. The functor class only
has one operation, called fmap, and the two functor laws are about how it
must behave.

  1. fmap id = id
    (Mapping ID over any structured sequence
    results in an unmodified sequence)
  2. fmap (f . g) = (fmap f) . (fmap g)
    (“.” is the function composition operation; this just says that fmap preserves the structure to ensure that that mapping is associative with composition.)

The monad laws are a bit harder, but not much. The mainly govern how monadic
operations interact with non-monadic operations in terms of the “return” and “>>=” operations of the Monad class.

  1. return x >>= f = f x
    (injecting a value into the monad
    is basically the same as passing
    it as a parameter down the chain – return is really just the identity functor passing its result on to the next step.)
  2. f >>= return = f
    (If you don’t specify a value for a return, it’s the same as just returning the result of the previous step in the sequence – again, return is just identity, so passing something into return shouldn’t affect it.)
  3. seq >>= return . f = fmap f seq
    (composing return with a function is equivalent to invoking that function on the result of the monad sequence to that point, and wrapping the result in the monad – in other words, it’s just composition with identity.)
  4. seq >>= (\x -> f x >>= g) = (seq >>= f) >>= g
    (Your
    implementation of “>>=” needs to be semantically equivalent
    to the usual translation; that is, it must behave like a functor composition.)

Comments

  1. #1 prunes
    January 31, 2007

    Oh, thanks for this post! I’ve been reading a lot about comonads the last week or so, and while I can see how easy the monad laws are, and how I can use them to functionally program certain things, I could not understand the motivation to invent the monad in the first place. The fact you stated that a sequence of monadic operations is itself a monadic operation somehow made things click! I had not seen it put quite like that before.

    A compile-time method for correctly verifying the enforcement of these rules can be shown to be equivalent to the halting problem.

    Erm, by this I assume you mean that there is no recursive procedure that can prove a given set of ostensibly monadic operations in fact form a monad? Because surely a monadic library writer could provide his own computer-checkable validity proof for his set of operations.

    I love your blog!

  2. #2 Pseudonym
    January 31, 2007

    Just a couple of comments.

    First off, the >>= operator is usually pronounced “bind”. This is just a disclaimer in case I call it “bind” in what follows.

    Secondly, you can connect the category theory definition and the Haskell definition if you introduce the join operation:

      join :: (Monad m) => m (m a) -> m a
      join m = m >>= return

    A bit of calculation shows that you can represent bind:

      m >>= k = join (fmap k m)

    “Join” is sometimes easier to reason about than “bind”. For example, “join” on linked lists concatenates a list-of-lists into a list.

    Now look at the category theory definition again: (T:C→C, η:1C→T, μ:T2 → T)

    T is a functor. That’s “fmap”.

    η is a natural transformation from 1C to T. That is, for all objects A, it is an arrow from 1(A) (which is just A) to T(A), which satisfies an additional mapping property that I’ll get to in a moment. This is just return, which has the type a -> m a.

    μ is a natural transformation from T2 to T. That is, for all objects A, it is an arrow from T (T A) to T A, with an additional mapping property. Compare this with join, with type m (m a) -> m a.

    The additional mapping property for a natural transformation is, basically, that natural transformations respect the structure of a functor. That is, if ν:F→G is a natural transformation, then for all arrows f, G(f)⋅ν = ν⋅F(f)

    For join, this means that for all f:

      fmap f . join = join . fmap (fmap f)

    Here we use the fact that T2 applied to f is T (T f).

    And for return, this means that for all f:

      fmap f . return = return . f

    Here, we use the fact that 1C applied to f is always f; it’s the identity, after all.

    Similarly, we can express commutativity of composition with μ as (assuming the types are correct):

      join . join = join . fmap join

    And commutativity of composition with η as:

      join . fmap return = join . return = id

    You will recall that id :: a -> a is the identity function in haskell.

    Once you’ve got this machinery, it’s not difficult to see where the monad laws come from. The laws for fmap, as Mark says, are just the category theory definition of a functor. But example, for example, a law like this:

    f >>= return = f

    Expanding bind:

      join (fmap return f) = f

    This is true for all f, so:

      join . fmap return = id

    Which you will recognise as one of the category theory properties above.

    Most interestingly of all is that there’s a deep connection between polymorphism and natural transformations. Consider the type of return for some monad:

      return :: a -> M a

    This is true for all possible types that can be used in place of a. It’s true for types like functions, for example, which can’t be tested for equality. It’s even true for types like void, which don’t have any “elements” in it at all. That means that “return” can’t really look inside its argument; all it can do is pass it along.

    So return must, in a sense, be completely oblivious to the types that it operates on. If A and B are types, then return :: A -> M A should behave exactly like return :: B -> M B, regardless of what A and B are. More formally, return should commute with any substitution of type B for type A.

    We can express this in Haskell. If you have a function f :: A -> B for some types A and B, then return should commute with f:

      return . f = fmap f . return

    Does this look familiar? We’re just saying that return is a natural transformation. Remarkably, the same argument also holds true of join; the fact that join is polymorphic means that it must be a natural transformation. And similar arguments hold for any polymorphic function in Haskell.

    This remarkable property of polymorphic functions known as the parametricity theorem. It states that a type which uses parametric polymorphism can be mechanically turned into a theorem, such that any function with that type must satisfy the associated theorem. In other words, a function, by virtue of being well-typed, has a theorem which you don’t have to prove. And often (as is the case with join and return), these “free theorems” are actually quite useful.

    The gory details are mentioned in Phil Wadler’s classic paper with the wonderful title “Theorems for Free!

    Now a few nits: The “remarkable” property of, say, return, is actually only true in a language without the fixpoint operator and without facilities for forcing strict evaluation, both of which Haskell has. Recent research has shown that there are still “free theorems” even in the presence of these operations, but they’re not nearly so succinct. For most well-behaved Haskell programs, however, the free theorems are still true.

  3. #3 Ørjan Johansen
    February 1, 2007

    It may seem that all I do here is nitpick, but:

    Fundamentally, in category theory a monad is a category with a particular kind of structure. It’s a category with one object.

    I think you confused monads with monoids in that paragraph. Certainly the category C defined further below does not need to have just one object.

    Although someone said the word monad is from monoid + triad, so I would like to understand what monads really have to do with monoids. Are they maybe the 2-categorical generalization, replacing objects with categories? After all the definition does have just one category in it.

  4. #4 Pseudonym
    February 1, 2007

    Ørjan: You’re right to nitpick.

    First off, you’re right that a monad is not “a category with one object” (plus structure). A monad over a category C can support lots of objects in C.

    The obvious example is Haskell itself. You can think of Haskell as a category whose objects are types and whose morphisms are functions. The category Haskell clearly has more than one object, but as you can see, monads are fairly straightforward to define.

    A monad is a generalisation of a monoid. You can think of a monoid as a category with a single object, but you can also think of it as an algebra over some carrier.

    Think of it this way: A monoid is a triple (A,⊗,e) consisting of:

    1. a carrier set, A,
    2. a binary associative operation, ⊗, and
    3. a specified left and right identity for ⊗, called e.

    You can describe this as the set A plus two functions, μ:A×A→A and η:1→A, where 1 is the singleton set (or the terminal object in the category Set), with the following laws satisfied, for all x, y, z ∈ A and for all e ∈ 1:

    1. μ(x,η(e)) = μ(η(e),x) = x
    2. μ(x,μ(y,z)) = μ(μ(x,y),z)

    Of course, in category theory, we don’t use elements, so instead we say (using ⋅ as category composition and ≡ to denote equality up to isomorphism):

    1. μ ⋅ idA×η ≡ μ ⋅ η×idA ≡ idA
    2. μ ⋅ idA×μ = μ ⋅ μ×idA

    If you draw a diagram (sorry, don’t know how to do it in comments), the comparison with monads should be obvious. But for completeness, in Haskell, you can see that this looks a bit like an associative law:

      (p >>= q) >>= r = p >>= (λx. q x >>= r)

    And this looks like an identity law:

      m >>= return = return a >>= λx. m = m

    So in a sense, bind is associative and return is its identity. Just like a monoid.

  5. #5 Ørjan Johansen
    February 2, 2007

    Ah yes. I remember now the curious fact that you can generalize the usual proofs that identity in a monoid is unique if it exists to a proof that return is determined from bind:

    If return1 x >>= f = f x for all f and x, then

     return1 x = return1 x >>= return = return x

    Alternatively, if x >>= return1 = x for all x, then

     return x = return x >>= return1 = return1 x

    Unfortunately, like with monoids this does not seem to turn into an actual expression for return, so you still need to supply both in the instance definition.

  6. #6 Ørjan Johansen
    February 2, 2007

    Actually my last comment and the associative law for monads are probably better summed up in the Kleisli category whose arrows/morphisms are functions of type a -> T b, identity arrows are return and which has composition

    (f >>> g) x = f x >>= g

    (For Haskell, see the Kleisli type in the Control.Arrow module.)

    Off topic, I’ll try once again to ask if there is a way to list more than the last ten recent comments in the blog without checking each article manually? Although the fact noone noticed the last time I asked probably means there isn’t.

  7. #7 Cale Gibbard
    March 5, 2007

    On what monads and monoids have to do with each other, there’s a nice formal way to see that they’re the “same thing”. A monoid is a monoid-object in Set (with tensor being Cartesian product), and a monad over a category C is a monoid-object in the category of endofunctors on C (with tensor being composition of functors).

    For the details, see http://en.wikipedia.org/wiki/Monoid_(category_theory) and http://en.wikipedia.org/wiki/Monoidal_category

  8. #8 cratylus
    June 29, 2007

    This was the best “connect the dots” style post on haskell and category theory I’ve read – thanks.
    The “data-hiding” ( or object-hiding) aspect
    of monads makes me think of Leibniz’s “windowless monads”.

The site is currently under maintenance and will be back shortly. New comments have been disabled during this time, please check back soon.