As promised, I’m finally going to get to the theory behind monads. As a quick
review, the basic idea of the monad in Haskell is a hidden transition function – a
monad is, basically, a state transition function.
The theory of monads comes from category theory. I’m going to assume you know
a little bit about category theory – if you have trouble with it, go take a look
at my introductory posts here.
Fundamentally, in category theory a monad is a category with a particular kind of
structure. It’s a category with one object. That category has a collection of
arrows which (obviously) are from the single object to itself. That oneobject
category has a functor from the category to itself. (As a reminder, a functor
is an arrow between categories in the category of (small) categories.)
The first trick to the monad, in terms of theory, is that it’s
fundamentally about the functor: since the functor maps from a category to
the same category, so you can almost ignore the category; it’s implicit in
the definition of the functor. So we can almost treat the monad as if it were just
the functor – that is, a kind of transition function.
The other big trick is closely related to that. For the programming language
application of monads, we can think of the single object in the category as
the set of all possible states. So we have a category object, which is essentially
the collection of all possible states; and there are arrows between the states
representing possible state transitions. So the monad’s functor is really just a mapping from arrows to different arrows – which basically represents the way that
changing the state causes a change in the possible transitions to other states.
So what a monad gives us, in terms of category theory, in a conceptual framework that captures the concept of a state transition system, in terms of transition functions that invisibly carry a state. When that’s translated into
programming languages, that becomes a value that implicitly takes an
input state, possibly updates it, and returns an output state. Sound familiar?
Let’s take a moment and get formal. As usual for category theory, first there are
some preliminary definitions.
 Given a category, C, 1_{C} is the identity functor from C to C.
 Given a category C with a functor T : C → C,
T^{2} = T º T.  Given a functor T, 1_{T} : T → T is the natural transformation from T to T.
Now, with that out of the way, we can give the complete formal definition
of a monad. Given a category C, a monad on C is
a triple: (T:C→C,
η:1_{C}→T,
μ:T^{2} → T), where T is
a functor, and η and μ are natural transformations. The members of the
triple must make the following two diagrams commute.
Commutativity of composition with μ
Commutativity of composition with η
What these two diagrams mean is that successive applications of the
statetransition functor over C behave associatively – that any sequence of
composing monadic functors will result in a functor with full monadic structure; and
that the monadic structure will always preserve. Together, these mean that any sequence
of operations (that is, applications of the monad functor) are themselves monad
functors – so the building a sequence of monadic state transformers is guaranteed to
behave as a proper monadic state transition – so what happens inside of the
monadic functor is fine – to the rest of the universe, the difference between a
sequence and a single simple operation is indistinguishable: the state will be
consistently passed from application to application with the correct chaining behavior,
and to the outside world, the entire monadic chain looks like like a single atomic
monadic operation.
Now, what does this mean in terms of programming? Each element of a monadic
sequence in Haskell is an instantiation of the monadic functor – that is, it’s an
arrow between states – a function, not a simple value – which is the basic
trick to monads. They look like a sequence of statements; in fact, each
statement in a monad is actually a function from state to state. And it looks like
we’re writing sequence code – when what we’re actually doing is writing function
compositions – so that when we’re done writing a monadic sequence, what we’ve actually done is written a function definition in terms of a sequence of function compositions.
Understanding that, we can now clearly understand why we need the return function to use a nonmonad expression inside of a monadic sequence – because each step in the sequence needs to be an instance of the monadic functor; an expression that
isn’t an instance of the monadic functor couldn’t be composed with the functions in the sequence. The return function is really nothing but a function that
combines a nonmonadic expression with the id functor.
In light of this, let’s go back and look at the definition of Monad
in
the Haskell standard prelude.
class Functor f where fmap :: (a > b) > f a > f b class Monad m where (>>=) :: m a > (a > m b) > m b (>>) :: m a > m b > m b return :: a > m a fail :: String > m a  Minimal complete definition:  (>>=), return m >> k = m >>= \_ > k fail s = error s
The declaration of monad is connected with the definition of functor – if you look,
you can see the connection. The fundamental operation of Monad
is
“>>=
” – the chaining operation, which is type m a > (a > m b) > m b
is deeply connected with the fmap
operation from Functor
‘s fmap
operation – the a
in m a
is generally going to be a type which can be a Functor
.
So the value type wrapped in the monad is a functor – in fact, the functor
from the category definition! And the “>>=
” operation is just the functor composition operation from the monad definition.
A proper implementation of a monad needs to follow some fundamental rules – the
rules are, basically, just Haskell translations of the structurepreserving rules about
functors and natural transformations in the categorytheoretic monad. There are two
groups of laws – laws about the Functor
class, which should hold for the
transition function wrapped in the monad class; and laws about the monadic operations
in the Monad
class. One important thing to realize about the functor and
monad laws is that they are not enforced – in fact, cannot be
enforced! – but monadbased code using monad implementations that do not follow them
may not work correctly. (A compiletime method for correctly verifying the enforcement
of these rules can be shown to be equivalent to the halting problem.)
There are two simple laws for Functor
, and it’s pretty obvious why
they’re fundamentally just strucurepreservation requirements. The functor class only
has one operation, called fmap
, and the two functor laws are about how it
must behave.

fmap id = id
(Mapping ID over any structured sequence
results in an unmodified sequence) 
fmap (f . g) = (fmap f) . (fmap g)
(“.” is the function composition operation; this just says that fmap preserves the structure to ensure that that mapping is associative with composition.)
The monad laws are a bit harder, but not much. The mainly govern how monadic
operations interact with nonmonadic operations in terms of the “return
” and “>>=
” operations of the Monad
class.

return x >>= f = f x
(injecting a value into the monad
is basically the same as passing
it as a parameter down the chain – return is really just the identity functor passing its result on to the next step.) 
f >>= return = f
(If you don’t specify a value for a return, it’s the same as just returning the result of the previous step in the sequence – again, return is just identity, so passing something into return shouldn’t affect it.) 
seq >>= return . f = fmap f seq
(composing return with a function is equivalent to invoking that function on the result of the monad sequence to that point, and wrapping the result in the monad – in other words, it’s just composition with identity.) 
seq >>= (\x > f x >>= g) = (seq >>= f) >>= g
(Your
implementation of “>>=
” needs to be semantically equivalent
to the usual translation; that is, it must behave like a functor composition.)