A Tree Grows Up in Haskell: Building a Dictionary Type

By goodmath on December 6, 2006.

Last time around, I walked through the implementation of
a very simple binary search tree. The implementation that I showed
was reasonable, if not great, but it had two major limitations. First,
it uses equality testing for the search, so that the implementation is only really suitable for use as a set; and second, because it's such
a trivial tree implementation, it's very easy for the tree to become
highly unbalanced, resulting in poor performance.

Today, we'll look at how to extend the implementation so that our BST
becomes useful as a key/value dictionary type. We'll take two different
approaches to that, each of which demonstrates some common Haskell
techniques: one based on on using higher order functions, and one based on
using tuples. Balancing the tree will be a task for another day.

As an aside: in a real Haskell program, you would probably never write a simple type like this. Haskell has a great library, and it's got plenty of library types that do a much better job than this. This
is really only for tutorial purposes.

Here's what we'd like to do. Instead of just being able to answer the
question "Is the value X in this tree?", we want to be able to ask "Is
there a value in this tree is associated with X, and if so, what is it?".

The way that we'll do that is to use second-order functions.
Instead of using the built in equality operator for comparing search keys
and nodes, we'll use a function, which we'll pass as a parameter to the
search operations.

First, we need to define a type for the comparison function. Given two
values - a key, and a value - we need to be able to determine whether the
key is greater than, equal to, or less than the
value. Second, given a value stored in the tree, we need to be able to
extract a key from it.


>data Comparison = Greater | Equal | Less
>
>type CompareFunc a = a -> a -> Comparison
> 
>type KeyExtractor a b = a -> b

Now, we can build a new version of the search tree. We'll start
with a type declaration, and the implementation of the insert operation.


>data KeyedSearchTree a = KSTNode a (KeyedSearchTree a) (KeyedSearchTree a)
>                       | KSTEmpty deriving (Show)
>
>kstInsert :: Ord b => (KeyExtractor a b) -> CompareFunc b -> (KeyedSearchTree a) -> a ->  (KeyedSearchTree a)
>kstInsert kextract comp (KSTNode k left right) v  =
>           case (comp (kextract k) (kextract v)) of
>               Greater -> KSTNode k (kstInsert kextract comp left v) right
>               Equal -> KSTNode v left right
>               Less -> KSTNode k left (kstInsert kextract comp right v)
>kstInsert kextract comp KSTEmpty v = KSTNode v KSTEmpty KSTEmpty
>

There are two main differences between this and our original version.

In this version, instead of using the builtin ">" comparison operator, we use a comparison function passed as a parameter to the
insert.
Instead of performing comparisons using the entire
values, we perform comparison on keys using an extraction function passed as a parameter to the insert.

For searching the tree, we want a different kind of operator than
we used in the generic BST. We don't just want to know if there's
a value with a particular key in the tree; we want to get
that value. And that leads us to a problem: What if there is no value
with the specified key in the tree? What do we return?

As it happens, Haskell has a helpful solution for that. There's a type
called Maybe designed for exactly this purpose: it's used for
both parameters and return values where a value is optional. You can think
of it as being something like a metatype that allows you to distinguish
between a type that *must* have a value, and a type that allows null
values. A value of type Maybe a can be either Just x, where x is of type a, or Nothing. Maybe also happens to be a monad, which gives it some really neat properties that we'll see later.

We're going to put the two function parameters first in the implementation; you'll see why in a moment.


>kstSearch :: (KeyExtractor a b) -> (CompareFunc b) -> (KeyedSearchTree a) -> b -> Maybe a
>kstSearch  keyextract compare (KSTNode v left right) key =
>         case (compare (keyextract v) key) of
>             Greater -> kstSearch keyextract compare left key
>             Equal -> Just v
>             Less -> kstSearch  keyextract compare right key
>kstSearch _ _ KSTEmpty _ = Nothing
>

Before moving on, there's one new thing in that implementation: the _ pattern. In Haskell, that's a pattern that means "I don't care". It matches any value, but doesn't bind it to anything. So you can't access the value of a parameter matched by an "_", which makes sense, since by using the "_" you've said that you don't care what value
is used there. Since searching an empty tree will always fail, we don't
care what functions are provided for doing compares, and we don't care what key is being searched for. We know the result of the function without
ever looking at them: nothing can be fund in an empty tree.

To be able to use this new tree, we need to provide a type for its
elements, and the comparison and key-extraction functions for that type.
We'll start with a tuple implementation. In Haskell, tuples are a kind of
fixed-length sequence where the tuple type specifies a distinct type for each of values in the sequence. For example, (1, 3.14, "foo") is a tuple with three elements (a 3-tuple), and the first element has type Integer, the second has type Float, and the third element has type String. Tuple types are written in the same way as tuples, but with types instead of values, so our example tuple as a whole has the type (Integer,Float,String).

For a search tree that looks like a typical dictionary, we can use an element type that's a simple 2-tuple; the first element of the tuple will be a key, and the second will be the value. We don't need to completely specify the types of the two elements of the tuple; we'll have a more flexible implementation if we use type variables. For the search tree to work, the key type will need to be an ordered type, so we'll constrain it with a type-class clause.


>trivialCompare a b | a > b = Greater
>            | a  b = Less
>            | otherwise = Equal
>
>tupleGetkey :: Ord k => (k,v) -> k
>tupleGetkey (k,v) = k

Now we get to the reason for putting the function parameters first. Remember how we talked about currying the other day? Here's a great example of how you might use it. We'll create two new custom insert and search functions for our tuple based binary search tree, by providing
only the first two parameters (the key extractor and compare functions) to the generic keyed search tree insert and search functions; the result of doing that will be to return a two-parameter function with the key extract and compare functions internally bound, so that users of our tree don't need to worry about the details. To them, the search tree just knows how to work with tuples.


>
>tupleKSTInsert :: Ord k => KeyedSearchTree (k,v) -> (k,v) -> KeyedSearchTree (k,v) 
>tupleKSTInsert = kstInsert tupleGetkey trivialCompare
>
>tupleKSTSearch :: Ord k => KeyedSearchTree (k,v) -> k -> Maybe (k,v)
>tupleKSTSearch = kstSearch tupleGetkey trivialCompare

We can get even a bit fancier: we can use another form of type definition to define a tuple tree type, and hide the fact that the
implementation is even based on our KeyedSearchTree. In fact, we can even mask out the fact that it's tuples!


>type DictionaryTree k v = KeyedSearchTree (k,v)
> 
>dtInsert :: (Ord k) => DictionaryTree k v -> k -> v -> DictionaryTree k v
>dtInsert tree k v = tupleKSTInsert tree (k,v)
>
>dtSearch :: (Ord k) => DictionaryTree k v -> k -> Maybe v
>dtSearch tree key =
>          case (tupleKSTSearch tree key) of 
>             Just (k,v) -> Just v
>             Nothing -> Nothing
>
>dtEmpty = KSTEmpty

Just for the sake of being twisted, here's a little bugger to populate
a DictionaryTree using a list of key/value tuples. Since a user of a DictionaryTree doesn't actually know that it's got tuples inside, this actually decomposes the tuples from the list in order to call dtInsert. (Hey, I said it was twisted!)


>populateDict :: Ord k => [(k,v)] -> DictionaryTree k v
>populateDict pairs = foldr (\ (k,v) t -> dtInsert t k v) dtEmpty pairs
>
>exampleDict = populateDict  [("a",4),("b",17),("c",21), ("abc",13), ("bd", 18)]

And here's a quick demonstration of using it in GHCI:

kokopelli:~/Documents/Blog markcc$ ghci tree2.lhs 
   ___         ___ _
  / _ \ /\  /\/ __(_)
 / /_\// /_/ / /  | |      GHC Interactive, version 6.6, for Haskell 98.
/ /_\\/ __  / /___| |      http://www.haskell.org/ghc/
\____/\/ /_/\____/|_|      Type :? for help.

Loading package base ... linking ... done.
[1 of 1] Compiling Main             ( tree2.lhs, interpreted )
Ok, modules loaded: Main.
*Main> 
*Main> exampleDict
KSTNode ("bd",18) (KSTNode ("abc",13) (KSTNode ("a",4) KSTEmpty KSTEmpty) (KSTNode ("b",17) KSTEmpty KSTEmpty)) (KSTNode ("c",21) KSTEmpty KSTEmpty)
*Main> dtSearch exampleDict "c"
Just 21
*Main> dtSearch exampleDict "q"
Nothing
*Main>

More like this

In general, to maximise currying opportunities, you should put the most "const" parameters to the left of the most "variable" ones, with the "induction" argument, if it exists, being on the very right. (It's called the induction argument because you can think of recursion as proof by induction; in this case, it's structural induction over a binary tree.) The one exception is that if there's an "accumulator" argument, then that could arguably go to the right of the induction argument.

In the case of kstSearch, the function arguments don't change in the recursive loop at all. They're effectively "const". So they should go first.

Similarly, the value to insert should really go before the tree that the value is being inserted into. The value is "const" in the loop, and the tree is the induction argument.

To see why, take a look at the above code:

kstInsert kextract comp (KSTNode k left right) v =
case (comp (kextract k) (kextract v)) of
Greater -> KSTNode k (kstInsert kextract comp left v) right
Equal -> KSTNode v left right
Less -> KSTNode k left (kstInsert kextract comp right v)
kstInsert kextract comp KSTEmpty v = KSTNode v KSTEmpty KSTEmpty

Remember that function application associates to the left, so f x y z = ((f x) y) z. There's logically a common subexpression here that can be eliminated. First, do a worker-wrapper transformation:

kstInsert kextract comp tree v
= kstInsert1 kextract comp tree v
where
kstInsert1 kextract comp (KSTNode k left right) v =
case (comp (kextract k) (kextract v)) of
Greater -> KSTNode k (kstInsert1 kextract comp left v) right
Equal -> KSTNode v left right
Less -> KSTNode k left (kstInsert1 kextract comp right v)
kstInsert1 kextract comp KSTEmpty v = KSTNode v KSTEmpty KSTEmpty

Now you can see that (kstInsert1 kextract comp) is a common subexpression. So eliminate it:

kstInsert kextract comp tree v
= kstInsert' tree v
where
kstInsert' (KSTNode k left right) v =
case (comp (kextract k) (kextract v)) of
Greater -> KSTNode k (kstInsert' left v) right
Equal -> KSTNode v left right
Less -> KSTNode k left (kstInsert' right v)
kstInsert' KSTEmpty v = KSTNode v KSTEmpty KSTEmpty

Much more readable.

But v is also common in the loop, so you could have written this:

kstInsert kextract comp tree v
= kstInsert' tree
where
kstInsert' (KSTNode k left right) =
case (comp (kextract k) (kextract v)) of
Greater -> KSTNode k (kstInsert' left) right
Equal -> KSTNode v left right
Less -> KSTNode k left (kstInsert' right)
kstInsert' KSTEmpty = KSTNode v KSTEmpty KSTEmpty

But it's much harder to spot if you put v last.

Thanks MarkCC. Keep 'em coming.

In case there are other beginners following these series of postings, I was a bit confused by this until I looked it up:

>data Comparison = Greater | Equal | Less
>
>type CompareFunc a = a -> a -> Comparison
>
>type KeyExtractor a b = a' -> b

'data' is used for declaring new types, and 'type' is used for type synonyms - not the other way round like you'd expect.

http://www.haskell.org/tutorial/index.html is a good place to look up stuff like that.

I got the idea behind data and type, but could anyone explain what is newtype useful for? I'm not quite sure when I need to declare my stuff with newtype.

newtype is useful for creating a new type for the type checker, that has
the same representation as an existing type.

> import Data.List

For example, say I wanted to write my own instance of Show for lists,
different to the default. Overriding the default Show instance is a bit
icky, so instead we declare a newtype:

> newtype T a = T [a]

which the compiler can happily distinguish as a distinct type to that of
lists. Since its a new type, we can write a new show instance:

> instance Show a => Show (T a) where
> show (T []) = "Empty"
> show (T xs) = concat . intersperse " " . map show $ xs

Running this:

*M> show (T [])
"Empty"

*M> show (T "haskell")
"'h' 'a' 's' 'k' 'e' 'l' 'l'"

So this type has the same representation as normal lists at runtime (the
T constructor is erased), yet it invokes a different show instance, as
it is treated as having a distinct type to that of lists.

One nifty feature GHC supports is generalised derived instances for
newtypes. The compiler can derive new class instances for newtypes,
using the underlying types' instance, saving boilerplate.

Say you define a custom money type:

> newtype Dollars = Dollars Int

now, this should be an instance of the Num class, but as it is a new,
distinct type to that of Int, we'd have to write tedious Num class
boilerplate:

> instance Num Dollars where
> Dollars a + Dollars b = Dollars (a+b)
> ... blah blah

which is annoying, since all it does is unwrap the Dollars constructor.

GHC however is smart enough to derive that the type Dollars should use
exactly the same Num instance as that of Int:

> newtype Dollars = Dollars Int deriving (Eq,Show,Num)

meaning you get a distinct type Dollars, with free inherited
implementations for common classes. This newtype deriving is
particularly useful for avoiding boilerplate code for large classes like
Num, or tricky classes like Monad (yes, you can deriving Monad for
newtypes!).

Hope that helps hint at what newtype is good for.

Your type Comparison, and function trivialCompare already exist in the standard library, under different names.

The type is called Ordering, and has values {LT,EQ,GT}. The function is called "compare", and is a standard method of the type class Ord.

lightstep:

I know that, but for the purposes of the tutorial, for people who've never seen Haskell before, it seemed like a good opportunity to introduce the idea of an enumerated type with some functions that use it. For that matter, the Haskell library already includes better versions of everything you'd want to use a binary search tree for. But for a tutorial
in a language whose basic constructs, and basic approach are so different from what most people are familiar with, I though it was better to start with explanatory examples, even when they're redundant, rather than pulling things right and left out of the prelude and libraries before all of the readers understand enough Haskell to be able to jump in to the library documentation, and understand how everything works.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Moving on

August 2, 2010

Finally, at long last, I can tell you what I've been up to with finding a new home for this blog. I've created a new, community-based science blogging site, called Scientopia. With the help of many wonderful people, we're ready. We launched this morning. So to continue following GM/BM - along with…

Goodbye, Scienceblogs

July 7, 2010

So my decision is made. I'm closing up around here. I'm in the process of working out exactly where I'm going to go. With any luck, Seed will leave this blog here long enough for me to post an update with the new location. But I'm through with Seed and ScienceBlogs.

Seed, Conflicts of Interest, and Sleaze

July 6, 2010

As my friend Pal wrote about, Seed Media Group, the corporate overlords of the ScienceBlogs network that this blog belongs to, have apparently decided that blog space in these parts is now up for sale to advertisers. We've been advertiser supported since I joined up with SB. I've never minded…

Searching for Topics

June 28, 2010

As regular readers have no doubt noticed by now, posting on the blog has been slow lately. I've been trying to come back up to speed, but so far, that's been mainly in the form of bad math posts. I'd like to get back to the good stuff. Unfortunately, the chaos theory stuff that I was…

Saturday Recipe: Ginger Scallion Sauce

June 26, 2010

Today's recipe is something I made this week for the first time, and trying it was like a revelation. It's simple to make, it's got an absolutely spectacularly wonderful flavor - light and fresh - and it's incredibly versatile. It's damned near perfect. It's scallion ginger sauce, and once you try…