Last week, I promised to continue my discussion of ropes. I’m going to
break that promise. But it’s in a good cause.
If you’re a decent engineer, one of the basic principles you should
follow is keeping this as simple as possible. One of the essential skills
for keeping things simple is realizing when you’re making something
overly complicated – even if it’s cool, and fun, and interesting.
Working out the rope code for more complex operations, I found myself
thinking that it really seemed complex for what I was doing. What I’m
doing is writing myself a text editor. I’m not writing a string
processing framework for managing gigabytes of data; I’m just writing a
text editor.
So I decided to try, for the purposes of testing, to look at one of
the simpler data structures. A classic structure for implementing editor
buffers is called a gap buffer. I’ll explain the details below.
The drawback of a gap buffer is that large cursor motions are relatively
expensive – the cost of moving the cursor is proportional to the distance
that you move it.
So I wrote a test. I implemented a basic gap buffer. Then I built a
test that did 300,000 inserts of five characters; then slid the cursor
from the end, to the beginning, back to the end, back to the beginning,
and back to the end again. Total execution time? 150 milliseconds. And
that includes resizing the buffer 12 times, because of a deliberately
stupid buffer sizing and copying strategy.
And with that, out the window went the ropes. Because in Java,
using a slapped together, sloppy piece of test code, which does things in a dumb
brute force way, it can do something much more complicated than anything an editor
will encounter in routine use, the gap buffer achieves performance that is more than
adequately fast. (Based on some experiments I did back at IBM, delays of 1/10th
of a second are roughly when people start to notice that an editor is slow. If you can respond is less than 1/10th of a second, people don’t perceive a troublesome delay.)
If the gap buffer can achieve the needed perfomance that easily, then there’s
absolutely no reason to do anything more complicated.
Ok. So, what’s a gap buffer?
It’s a fascinatingly simple data structure – exactly the kind of
thing that appeals to my sense of software aesthetics. It’s a great big
buffer – an array of characters. It’s got two segments of text – one at
the beginning of the buffer, which grows upwards; and one at the end of
the buffer, which grows downwards. The space between those two segments
is called the gap. The gap is the cursor in the
buffer. To move the cursor forwards, you just move the gap: move one character from the
beginning of the end segment to the end of the beginning segment. To
insert a character at the cursor, you add a character to the beginning
segment. To delete the character before the cursor, you just subtract one
from the length of the beginning segment.
For example, below is a diagram of a gap buffer. In the first version, it contains the text “Hello there readers”, with the cursor
after the “r” in “readers”.

The next version
moves the cursor backwards one step. So we copy one character from
the end of the pre-gap region to the beginning of the post-gap region,
and then adjust the pointers. We don’t even bother to erase the buffer
position within the gap that used to contain the “r” – we just move
the pre-gap pointer.
In the third version, we insert the string “my” – just write the characters “m” and “y” into the next positions in the pre-gap region,
and advance the pre pointer. The resulting buffer contains the text
“Hello there myreaders”.
That’s all there is to it. A gap buffer is just a character array, with two pointers/indices to track the position of the beginning and ending segments. It’s incredibly simple – and yet, it’s also
quite surprising. I certainly wouldn’t have thought of it on my own!
For its simplicity, it still manages to achieve outstanding performance on all of the most common actions in an edit buffer:
- Local cursor motions are extremely fast; it’s always exactly
O(1) cost for a single character cursor move, O(n) for arbitrary
cursor motions. - Character inserts are really fast. They’ve got an amortized
average cost of O(1). (The reason for that amortized cost is
because you sometimes need to expand the buffer, which
requires a lot of copying.) - Character deletes are super fast – just increment or decrement
a pointer – O(1). - Search/scan is reasonable. Stream oriented operations like
search have an absolute linear-time lower bound, which is
obviously achieved by the gap buffer. And as important,
implementing stream-oriented operations is extremely
easy to do correctly on a gap buffer.
Finally, I’ll close with some code. This is a basic gap buffer
implementation in Scala, my new favorite programming language. (I’m
long been a huge fan of OCaml. Scala is very much like OCaml adapted
for the JVM, with a better syntax.)
// Copyright 2009, Mark C. Chu-Carroll
package com.scienceblogs.goodmath.apex.buffer
class GapBuffer(size : Int) {
var _buf : Array[Char] = new Array[Char](size)
var _size = size
var _presize = 0;
var _postsize = 0;
/**
* Move the cursor one character forward.
*/
def stepCursorForward() = {
if (_postsize > 0) {
_buf(_presize) = _buf(_size - _postsize)
_presize = _presize + 1
_postsize = _postsize + 1
}
}
/**
* Move the cursor one character backward.
*/
def stepCursorBackward() = {
if (_presize > 0) {
val c = _buf(_presize - 1)
_buf(_size - _postsize - 1) = c
_postsize = _postsize + 1
_presize = _presize - 1
}
}
/**
* Move the cursor.
* @param distance the distance to move the cursor.
*/
def moveCursor(distance : Int) = {
if (distance > 0) {
for (i <- 0 until distance) {
stepCursorForward()
}
} else {
val d = - distance
for (i <- 0 until d) {
stepCursorBackward()
}
}
}
/**
* Move the cursor to a specific position.
* @param loc the desired cursor position.
*/
def moveCursorTo(loc : Int) = {
moveCursor(loc - _presize)
}
/**
* Insert a character immediately after the cursor position.
* @param c the character to insert.
*/
def insertChar(c : Char) = {
if (_presize + _postsize == _size) {
expand()
}
_buf(_presize) = c
_presize = _presize + 1
}
/**
* Insert a string after the cursor position, and advance
* the cursor to the position after the inserted string.
* @param s the string to insert
*/
def insertString(s : String) = {
for (i <- 0 until s.size) {
insertChar(s.charAt(i))
}
}
/**
* Double the size of the character storage buffer.
*/
def expand() = {
val newsize = _size * 2
var newbuf = new Array[Char](newsize)
for (i <- 0 until _presize) {
newbuf(i) = _buf(i)
}
for (i <- 0 until _postsize) {
newbuf(newsize - i - 1) = _buf(_size - i - 1)
}
_buf = newbuf
_size = newsize
}
}