Now on ScienceBlogs: Charles Darwin February 12, 1809 - April 19, 1882

ScienceBlogs Book Club: Inside the Outbreaks

Good Math, Bad Math

Finding the fun in good math; Shredding bad math and squashing the crackpots who espouse it.

Search

Profile

markcc.jpg
Mark Chu-Carroll (aka MarkCC) is a PhD Computer Scientist, who works for Google as a Software Engineer. My professional interests center on programming languages and tools, and how to improve the languages and tools that are used for building complex software systems.

Donors Choose

Other Information

Add this blog to my Technorati Favorites!

Recent Posts

Recent Comments

Categories

Blogroll

Old Topic Indices

Great Online Books

« Dembski Stoops Even Lower: Legal Threats to Silence a Critic | Main | Shameful Innumeracy in the New York Times »

Types in Haskell: Types are Propositions, Programs are Proofs

Category: Haskell
Posted on: November 17, 2009 9:27 AM, by Mark C. Chu-Carroll

(This is a revised repost of an earlier part of my Haskell tutorial.)

Haskell is a strongly typed language. In fact, the type system in Haskell is both stricter and more expressive than any type system I've seen for any non-functional language. The moment we get beyond writing trivial integer-based functions, the type system inevitably becomes visible, so we need to take the time now to talk about it a little bit, in order to understand how it works.

One of the most important things to recognize about Haskell's type system is that it's based on type inference. What that means is that in general, you don't need to provide type declarations. Based on how you use a value, the compiler can usually figure out what type it is. The net effect is that in many Haskell programs, you don't write any type declarations, but your program is still carefully type-checked. You can declare a type for any expression you want, but the only time you must is when something about your code is ambiguous enough that type inference can't properly select one type for an expression.

I'm not going to go into a lot of detail, but the basic idea behind Haskell's type system is that there's a basic propositional logic of types. Every type is represented as a statement in the type-logic. So as you'll see below, an integer is written as type Integer. A function from integers to integers has a type which is an implication statement: "Integer -> Integer", which you can read in the logic as "If the input type is Integer then the output type is Integer". The way that type inference works is that values with known types become propositions; the basic type inference process s just logical inference over the type logic.

The type logic is designed so that the inference process, when it analyzes an expression, it will generate the most general type; that is, when it's inferring a type for an expression, it will always pick the most general, inclusive type that can match the expression. And that leads to one complication for beginners: Haskell's type system is almost two different type systems - a basic type system, and a meta-type system. The meta-type system is based on something called type classes, which group related families of types together. The most general types that come up as a result of type inference are frequently based on type classes, rather than on specific concrete types.

A type-class in code is basically a predicate over a type variable. For example, the type of the parameters to the "+" operator must be a numeric type. But since "+" can be used on integers, floats, complex, rationals, and so on, the type of the "+" operator's parameters needs to be something that includes all of those. The way that Haskell says that is the type is "Num a => a" - that is, that the type a is a member of the type-class Num. The way to understand that is "Some type 'a' such that 'a' is a numeric type.".

The thing to remember is that essentially, a type-class is a type for types. A type can be thought of as a predicate which is only true for members of that type; a type-class is essentially a second-order predicate over types, which is only true for types that are members of the type-class. What type-classes do is allow you to define a general concept for grouping together a family of conceptually related types. Then you can write functions whose parameter or return types are formed using a type-class; the type class defines a predicate which describes the properties of types that could be used in the function.

So if we write a function whose parameters need to be numbers, but don't need to be a specific kind of number, we would write it to use a type-class that described the basic properties of numbers. Then you'd be able to use any type that satisfied the type-predicate defined by the type-class "Num".

If you write a function that uses numeric operations, but you don't write a type declaration, then Haskell's type-inference system will infer types for it based on the "Num" type-class: "Num" is the most general type-class of numbers; the more things we actually do with numbers, the more constrained the type becomes. For example, if we use the "/" operator, instead of inferring that the type of parameter must be an instance of the "Fractional" type-class.

With that little diversion out of the way, we can get back to talking about how we'll use types in Haskell. Types start at the bottom with a bundle of primitive atomic types which are built in to the language: Integer, Char, String, Boolean, and quite a few more. Using those types, we can construct more interesting types. For now, the most important constructed type is a function type. In Haskell, functions are just values like anything else, and so they need types. The basic form of a simple single-parameter function is "a -> b", where "a" is the type of the parameter, and "b" is the type of the value returned by the function.

So now, let's go back and look at our factorial function. What's the type of our basic "fact" function? According to Hugs, it's "Num a => a -> a".

Definitely not what you might expect. What's happening is that the system is looking at the expression, and picking the most general type. In fact, the only things that are done with the parameter are comparison, subtraction, and multiplication: so the system infers that the the parameter must be a number, but it can't infer anything more than that. So it says that the type of the function parameter is a numeric type; that is, a member of the type-class "Num"; and that the return type of the function is the same as the type of the parameter. So the statement "Num a => a -> a" basically says that "fact" is a function that takes a parameter of some type represented by a type variable "a" and returns a value of the same type; and it also says that the type variable "a" must be a member of the meta-type "Num", which is a type-class which includes all of the numeric types. So according to Haskell, as written the factorial function is a function which takes a parameter of any numeric type, and returns a value of the same numeric type as its parameter.

If we look at that type, and think about what the factorial function actually does, there's a problem. That type isn't correct, because factorial is only defined for integers, and if we pass it a non-integer value as a parameter, it will never terminate! But Haskell can't figure that out for itself - all it knows is that we do three things with the parameter to our function: we compare it to zero, we subtract from it, and we multiply by it. So Haskell's most general type for that is a general numeric type. So since we'd like to prevent anyone from mis-calling factorial by passing it a fraction (which will result in it never terminating), we should put in a type declaration to force it to take the more specific type "Integer -> Integer" - that is, a function from an integer to an integer. The way that we'd do that is to add an explicit type declaration before the function:

	
> fact :: Integer -> Integer
> fact 0 = 1 
> fact n = n*fact(n-1)

That does it; the compiler accepts the stronger constraint provided by our type declaration.

So what we've seen so far is that a function type for a single parameter function is created from two other types, joined by the "->" type-constructor. With type-classes mixed in, that can be prefixed by type-class constraints, which specify the type-classes of any type variables in the function type.

Before moving on to multiple parameter functions, it's useful to introduce a little bit of syntax. Suppose we have a function like the following:

poly x y = x*x + 2*x*y + y*y 

That definition is actually a shorthand. Haskell is a lambda calculus based language, so semantically, functions are really just lambda expressions: that definition is really just a binding from a name to a lambda expression.

In lambda calculus, we'd write a definition like that as:

  poly ≡ λ x y . x*x + 2*x*y + y*y

Haskell's syntax is very close to that. The definition in Haskell syntax using a lambda expression would be:

 poly = (\ x y -> x*x + 2*x*y + y*y)

The λ in the lambda expression is replaced by a backslash, which is the character on most keyboards that most resembles lambda; the "." becomes an arrow.

Now, with that out of the way, let's get back to multi-parameter functions. Suppose we take the poly function, and see what Hugs says about the type:

    poly x y = x*x + 2*x*y + y*y 

hugs>  Main> :type poly
hugs> poly :: Num a => a -> a -> a

This answer is very surprising to most people: it's a two parameter function. So intuitively, there should by some grouping operator for the two parameters, to make the type say "a function that takes two a's, and returns an a"; something like "(a,a) -> a".

But functions in Haskell are automatically curried. Currying is a term from mathematical logic; it's based on the idea that if a function is a value, then you don't ever need to be able to take more than one parameter. Instead of a two parameter function, you can write a one-parameter function that returns another one-parameter function. Since that sounds really confusing, let's take a moment and look again at our "poly" example:

    poly x y = x*x + 2*x*y + y*y 

Now, suppose we knew that "x" was going to be three. Then we could write a special one-parameter function:

	poly3 y = 3*3 + 2*3*y + y*y

But we don't know "x". But what we can do is write a function that takes a parameter "x", and returns a function where all of the references to x are filled in, and given a y value, will return the value of the polynomial for x and y:

	polyn x = (\y -> x*x + 2*x*y + y*y)

If we call "polyn 3", the result is exactly what we wrote for "poly3". If we call "polyn a b", it's semantically exactly the same thing as "poly a b". (That doesn't mean that the compiler actually implements multi-parameter functions by generating single-parameter functions; it generates multi-parameters functions the way you'd expect. But everything behaves as if it did.) So what's the type of polyn? Well, it's a function that takes a parameter of type a; and returns a function of type "Num a => a -> a". So, the type of polyn is "Num a => a -> (a -> a)"; and since the precedence and associativity rules are set up to make currying convenient, the parents in that aren't necessary - that's the same as "Num a => a -> a -> a". Since "poly" and "polyn" are supposed to be semantically equivalent, that means that "poly"'s type must also be "Num a => a -> a -> a'".

Share on Facebook
Share on StumbleUpon
Share on Facebook
Find more posts in: Technology

Comments

1

Haskell accepts Unicode characters for certain symbols like lambdas, so you can actually write

poly = λ x y → x*x + 2*x*y + y*y

which is even closer to the classic lambda syntax.

Posted by: Luke Zapart | November 17, 2009 10:15 AM

2

Very fascinating post! This is definitely a language I'm going to have to spend some time playing with.

Near the end of your article, I was about to complain with something along the lines of "Well, won't currying cause the type definitions of functions which accept many parameters to rapidly become unreadable?". Then, I remembered that I'm currently working with C++ STL and have to stare at 200 characters of gobbeldygook every time I have a syntax problem involving a list of strings. Compared to that, having to count on my fingers to figure out "a => a -> a -> a -> a -> a -> a" is downright elegant. :-)

Posted by: DSimon | November 17, 2009 10:19 AM

3

I'm not sure I'd say that types "are" propositions and programs "are" proofs. It's more accurate to say that types are the objects and proofs are the morphisms in a cartesian closed category Hask, and that to any CCC we can associate an "internal logic" to discuss its semantics.

To say that a program "is" a proof immediately raises the questions to a casual reader "what does it prove?" and "what does it mean to 'prove' a type?"

Posted by: John Armstrong | November 17, 2009 11:00 AM

4

@John Armstrong - i think the "Types are Propositions, Programs are Proofs" statement comes from Intuitionistic Type Theory. If a Type is interpreted as a Proposition, then a Program of that Type can be interpreted as a Proof of that Proposition. This doesn't mean you're proving a Type as such.
It doesn't seem to be relevant to the article though; i guess it's just a catchy title.

Posted by: jon hanson | November 17, 2009 12:10 PM

5

John, please see the Curry-Howard isomorphism. However, do also note that --- even in a type system as comparatively expressive as Haskell's --- this correspondence is useful for proving metatheorems about the type system but the proofs that correspond to programs are typically not particularly interesting at the object level. For example, you can't prove that a list-reverse function actually reverses its argument by its type without an even more expressive type system. (To see what such a type system would look like, see the Coq tool from INRIA.)

Posted by: wb | November 17, 2009 1:06 PM

6

A more appropriate type for fact would be:

fact :: Integral a => a -> a

Posted by: Anonymous | November 17, 2009 7:12 PM

7

wb: but see "theorems for free!" for example which demonstrates how parametric types can give rise to quite useful proofs at the object level. And related papers about free theorems in the presence of typeclass constraints, etc. which yield some even more useful (if less elegant) results. also see "faking it" and other work on how types can be made significantly more expressive vis a vis proofs and invariants, and okasaki and others' work on data structures and recursive types which shows how even plain h98 types can be made to encode fairly interesting properties. "i am not a number, i am a free variable" is pretty cool in this regard too.

Posted by: sclv | November 17, 2009 8:33 PM

8

One point that has been missed is that Haskell, unlike say Coq, has unbounded recursion. This means that all propositions have proofs --- for example,

pf :: a = let x = x in x

is a proof of any 'a'.

Posted by: Simon | November 18, 2009 8:14 AM

9

Great post, it's so good to see someone explaining clearly these concepts. Thanks

Posted by: chato | November 18, 2009 8:34 AM

10

@John Armstrong

The proposition associated with a type is the intuitionistic statement 'There is an object of type T'. The program proves it by generating an example. Note that there is no other way of proving such an intuitionistic statement; the statement 'The assumption that there are no objects of type T leads to a contradiction' is not intuitionistically equivalent to the first statement.

Most programs that programmers write are rather trivial as proofs (they usually don't aim to generate objects of a type when they don't know whether or not the type is inhabited). However it tends to be more useful when run in the other direction. (If I prove a proposition intuitionistically, I get an algorithm that generates witnesses to the proposition 'for free')

Note that under my interpretation Simon@8 is wrong. If the program never terminates, then it never produces an object of it's result type, what type has been proven to be inhabited?

Posted by: Roger Witte | November 18, 2009 12:37 PM

11

I guess it wasn't as clear as I thought it was from the text.

What I find fascinating about the idea of types as logical statements, and inference as proof, is that the process of performing type inference on a language like Haskell is the process of performing logical proofs in the type logic. By finding the fundamental mapping between types and logical statements, Hindley-Milner type inference is really logical inference in a particular intuitionistic logic.

When you compare that to the kinds of type inference that you do when you're compiling languages like C++, the logical structure of the type system in Haskell is amazingly beautiful. It makes sense in a deep way, because the type system is a logic, and type inference is a proof in that logic.

Posted by: Mark C. Chu-Carroll | November 18, 2009 1:24 PM

12

Haskell is becoming interesting and I am glad of having chosen it to improve my functional programming...

Posted by: Giorgio Sironi | November 18, 2009 2:32 PM

13

So then what would be the type signature for something like map which ought to take some type of set, a function that accepts members of the set, and returns a set whose members of the type produced by the function.

Posted by: John F. Miller | November 18, 2009 5:03 PM

14

@13:

A set-map function would have a signature like:

setmap :: Set a -> (a -> b) -> Set b

But the exact type would depend on the set implementation. For example, if the set was represented as a binary search tree, you'd need to put a type-class constraint on a to ensure that it supported the comparison operations needed by the set, like:

setmap :: (Ord a, Ord b) => Set a -> (a -> b) -> Set b

Posted by: Mark C. Chu-Carroll | November 18, 2009 6:39 PM

15

This means that all propositions have proofs

Posted by: kilo aldırıcı | November 19, 2009 5:21 AM

16

It's more accurate to say that types are the objects and proofs are the morphisms in a cartesian closed category Hask, and that to any CCC we can associate an "internal logic" to discuss its semantics.

Posted by: iştah açıcı | November 19, 2009 5:24 AM

17

I don't know if it's entirely accurate to say that type *inference* is inference in the logic corresponding to the type system, although it may be related. What inference does is walk through sub-terms, and figure out what typing rules correspond to each sub expression. So, in the case where it sees:

f x

it knows that f must have a type of the form 'a -> b', and x must have type 'a', and the combination has type 'b'. The inference algorithm uses this by collecting a bunch of equations between the variables like 'a' and 'b', and then uses a unification algorithm to figure out the most general type for everything.

For a more logical-inference type application, it might be good to look at djinn. It uses the fact that the completeness (I think that's the right one) theorem for the logic corresponding to (a somewhat restricted subset of) the Haskell type system can be proved constructively, which translates to an algorithm for deciding whether a given proposition/type is inhabited, which if it is, also yields a term with the relevant type. So djinn takes a type and writes code with that type, or tells you that it's impossible to do so (and it even gives you the correct implementation of callCC for the continuation monad, which I remember finding impressive when I first saw it).

Posted by: Dan Doel | November 19, 2009 9:16 PM

ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter

© 2006-2011 ScienceBlogs LLC. ScienceBlogs is a registered trademark of ScienceBlogs LLC. All rights reserved.