Finger Tree Update: I forgot something

As an alert commenter pointed out, I left out one really important thing in
my earlier post about finger trees. That's what I get for trying to write when
I'm sick :-). Seriously, this is a point that's implied by the post as it stands, but never explicitly stated - and since it's really important, it
should be stated explicitly.

The monoid-annotated tree structure doesn't replace the original
data structure: it's superimposed on it.

So, as I said: cons-cell style lists are ubiquitous and beloved by
functional and non-functional programmers. In finger trees, you're
not getting rid of them. The point of finger trees is to let
you keep the convenient data structure, with its basic operations
and properties intact, and to augment it with a tree that lets you search it efficiently.

To illustrate: here's the number list example from yesterdays post. The
original list is at the bottom, with green pointers representing the
original cons-list next-pointers. The monoid-annotated tree is on top,
with red pointers. The combination of the original list with the
monoid annotated tree is a finger tree.

i-0bdf4c8399a6b15e13cbbc9c34bdee8e-counted-finger-tree-list.png

The point of this is that you've still got your cons list. All of the
beautiful recursive/iterative algorithms that walk the list continue to
work exactly the way that they would in the traditional cons-list: for code that walks the list, the fact that there are finger-tree pointers
sitting on top of the list is irrelevant - and, in fact, completely invisible. For algorithms that want to search the list, the tree structure is there,
and allows the searches to be performed much more quickly than they could be on
the traditional list. The superposition of those two structures is the
genius of the finger tree.

More like this

For ages, I've been promising to write about finger trees. Finger trees are an incredibly elegant and simple structure for implementing sequence-based data structures. They're primarily used in functional languages, but there's nothing stopping an imperative-language programmer from using them as…
A while ago, I wrote a href="http://scienceblogs.com/goodmath/2009/05/finally_finger_trees.php">couple of posts that claimed to talk about finger trees. Unfortunately, I really botched it. I'd read a bunch of data structure papers, and managed to get myself thoroughly scrambled. What I wrote…
This post is very delayed, but things have been busy. I'm working my way up to finger trees, which are a wonderful functional data structure. They're based on trees - and like many tree-based structures, their performance relies heavily on the balance of the tree. The more balanced the tree is,…
In the Haskell stuff, I was planning on moving on to some monad-related stuff. But I had a reader write in, and ask me to write another post on data structures, focusing on a structured called azipper. A zipper is a remarkably clever idea. It's not really a single data structure, but rather a…

But this breaks the "quick updating" property in functional languages -- you have to rebuild the entire left side of whatever update you do.

By Anonymous (not verified) on 28 May 2009 #permalink

Re #2:

As in pretty much everything in data structures, there's a tradeoff. In this case, you need to add the cost of maintaining the tree
to the inserts - making inserts O(lg n) in the worst case (although there are workarounds to improve head and tail inserts in the average case). But in exchange for the increase in cost of inserts, you get a vastly decreased search cost. As always when you're programming, you need to understand your application, and figure out what set of tradeoffs will work best. There are definitely applications where using a finger-tree is absolutely the wrong thing. But for other applications, it's amazing. (For a real tour-de-force of beautiful data structure stuff, try looking at the Yi editor written in Haskell. They implement their buffers using finger-tree-ropes.

Er, this still isn't a finger tree.

The 'fingers' in a finger tree are the left and right most nodes in your 2-3 tree. You grab the left most and right most node of the tree and hold onto them, while you let the center of the tree go, so it 'dangles' down yielding a bat-wing like structure, where you've inverted the pointers that previously had pointed down to your leaves to instead point 'down' towards the center of your tree!

By holding onto the two fingers you can quickly access the leftmost and rightmost edge of your fingertree, but you pay more to access the 'body' of your 'bat', the former root node which you now have to find by descending down the 'v' shaped spine of the tree. Note that you can get there by climbing down from either the left or right finger. This is important because it enables fingertree concatenation to be done by having your two finger trees 'hold hands' and then just 'pushing down' on the hand in the middle until you get that nice v shape again.

Your previous monoidally annotated 2-3 tree that you are using lacks that property completely. It is a useful structure, and its related to a finger tree, but it has no fingers! This structure is also a different beast. You do not maintain both the tree and some list structure, a finger tree is symmetric.

The tradeoff for insertions is rather low, especially when you consider that insertion is done once, but access is normally done _atleast_ once.

In my previous description I said the shape of a finger-tree is like a bat-wing structure. The common implementation in Haskell used by the Hinze/Patterson paper stylizes this by using non-uniform recursion.

If we take a Haskell style finger-trees with monoidal result v and entries of type a

data FingerTree v a = Deep v (Digit a) (FingerTree v (Node a)) (Digit a) | Single a | Empty
data Digit a = One a | Two a a | Three a a a | Four a a a a
data Node v a = Node2 v a a | Node3 v a a a

I'll fix the monoid to be the 'size' monoid which just adds up the element count, mapping each input to 1, as is used inside Data.Sequence.

The empty fingertree is just

Empty

As we add elements a and b we quickly get to where we have two 'fingers' that we can access quickly.

Single a

Deep 2 (One a) Empty (One b)

As we add more elements (c and d) they can be added to the digits on the left or the right

Deep 3 (Two a b) Empty (One c) -- or -- Deep 3 (One a) Empty (Two b c)
Deep 4 (Two a b) Empty (Two c d) -- or -- Deep 4 (One a) Empty (Three b c d) -- or -- Deep 4 (Three a b c) Empty (One d)

but eventually they fill up. Once there are enough of them you can open another node through the link in the Deep node, and the non-uniform recursion adds a 'Node' level to everything hanging off a node this far 'up' towards the original root, so once we add 'e' it could go in as:

Deep 5 (Four a b c d) Empty (One e) -- or -- Deep 5 (One a) (Single (Node3 3 (One b) (One c) (One d)) (One e) -- etc...

But once we exceed 8 elements, we HAVE to start using nested nodes, because can only have 8 'fingers' we can represent with the two 'Digit' slots available to us.

Deep 12 (Three a b c) (Single (Node3 6 (Two d e) (Two f g) (Two h i))) (Three j k l)

the monoid is cached in the Deep entries in our FingerTree, and in the Nodes we use for 2-3 trees of a known depth, which hang off of the sides of our trees, but we don't bother to do so for the digits.

The tricky thing that you get out of this structure is that the non-uniform recursion enforces that the height of both of the trees that dangle off a deep node is the same. Digits allow you the flexibility needed to efficiently push or pop entries on either end like a deque, uniformly from either side.

The top level 'deep' node here represents the two fingers, where we've 'grabbed our binary tree and let go of the root. The nested deep node is the body of our 'bat' hanging from those fingers, since the two nodes closer to the original root of our tree from the fingers were 'higher' up a balanced 2-3 tree they should have bigger trees hanging off of them, containing their other children.

I forgot to mention that I was drawing things on the chalkboard next to the slides, so if you want an easy way to draw the tree:

The top level is a horizontal line with a dot in the middle and one to four of the items (letters, we'll say) sitting on each side of the dot. So, up to eight letters total sitting on the line.

If you want a second level, bring a vertical line down from the dot to another dot in the middle of another horizontal line. Sitting on that line are not letters, but circles with two or three lines angling down from them with a letter at the end of each of these angled lines.

If you want a third level, the circles angle down to circles that angle down to letters. In other words, the third horizontal line has roots of 2-3 trees of height two sitting on it.

And so forth, though I'm sure you'll want to stop there. (Sorry, I'm bad at drawing on a computer.)

If you want to annotate the tree, the annotations can go in the circles.

Just to echo Rui Ferrara. Skip Lists also address this area in a simple and if implemented appropriately, very cache friendly manner. People I think get put off by their probabilistic nature, but really as the list grows and the time matters more the efficiency gets better.

@Markk: Actually skip lists don't address quite the same use case. Finger trees provide O(1) access to both ends. You still have a logarithmic access time to the far end of the skip list.