Basics: Binary Search

For the basics, I wrote a bunch of stuff about sorting. It seems worth taking a moment
to talk about something related: binary search. Binary search is one of the most important
and fundamental algorithms, and it shows up in sorts of places.

It also has the amazing property that despite being simple and ubiquitous, it's virtually
always written wrong. There's a bit of subtlety in implementing it correctly, and virtually
everyone manages to put off-by-one indexing errors into their implementations. (Including me; last time I implemented a binary search, the first version included one of the classic errors.) The errors are so ubiquitous that even in a textbook that discusses the fact that most programmers get it wrong, they got it wrong in their example code!

In its simplest form, you have a list of values L={l1,...,ln}, in sorted order; and you have a value, x. You want to find out which, if any, of the lis is equal to x.

The algorithm is a simple recursive one:

binary_search(L=[L1,...,Ln], x) 
   if L=[] then return not_found
   let a = n/2
   if La == x then 
      return found(a)
   else if La > x then
      return binary_search([Li | i < a ], x)
   else
      return binary_search([Li | i > a], x)

You pick a position, i, in the middle of the list. If Li is equal to x, then you've found x at position i. If Li is greater than x, then you know that if x is in the list, it must be in the half of the list that's smaller than Li. If Li is less than x, then you know that if x is in the list, then it must be in the half of the list that's larger than Li. If in any of the recursions, you wind up with an empty list, then you know that x wasn't in the list.

Before moving on to point out some of the interesting places this pops up, let's take a moment to think about how complex it is. Each step of the algorithm, we either find the number we're looking for, or we eliminate half of the list. So if in the first step, we're searching a list of N elements, then in the second, we'll be searching a list of N/2 elements; in the third, we'll be searching a list of N/4 elements; and so on. So it takes O(lg N) steps in the worst case.

Is this optimal in the number of comparisons? Yes - and it's easy to prove, using Kolmogorov-Chaitin complexity theory. If X is in the list, it could be in any one of N positions in a list of size N. How much information do we need to identify exactly one position in N? We need an integer with at least lg(N+1) bits. (Why N+1? because we also need a way to say "It wasn't there".) So if we're using comparisons, which generate one bit of information per comparison, and the algorithm needs to generate lg(N+1) bits, then in the worst case, any comparison-based algorithm will need to do lg(N+1) comparisons - which is O(lg N).

Aside from searching sorted lists, variants of binary search pop up all over the place. For example, think about Newton's method for finding the roots of an equation. You start with a guess; check how close it is to zero, and use the result of that to prune off part of the search space - it's a variant binary search; constantly narrowing down the range in which the root can be found: the only real change is that you can cleverly use the derivative to narrow down the answer more quickly than a direct binary pruning. But the basic concept is the same as binary search.

There are a few neat examples from my research area. I do work with SCM systems - which
normal people sometimes call "version control systems". A version control system is a
system that programmers use that keep a complete record of every change they've made to
their program - so using it, they can go back to old versions, check the history of the
development of their systems, etc. One thing that comes up in SCM systems is what we call
"blame identification". The idea is, we know that at some time, one particular piece of
code was added, and we later discovered that that piece of code introduced a bug to the
system, and we want to know exactly when in was introduced, and who was
responsible for it. How do we figure out when it got added?

We look at the history of the system, and we pick a point halfway through the history, and look to see if it contains that change. If it doesn't, then we know the change was added later than it - so then we go to search the changes that happened later. It's exactly binary search over the history of the program.

John Bentley's brilliant book "Programming Pearls" has a great discussion of some of the other places that binary search comes up. It's really an astonishingly common algorithm; I was totally blown away when I read Bentley's book, and realized how many problems could be understood as variants of binary search.

More like this

One of the most neglected data structures in the CS repertoire is the heap. Unfortunately, the jargon is overloaded, so "heap" can mean two different things - but there is a connection, which I'll explain in a little while. In many programming languages, when you can dynamically create objects in…
During my Haskell tutorial, I used balanced binary search trees as an example. I've had people write to me asking me to write about that in a non-functional programming post, because the Haskell one got too tied up in how to do things without assignments, and with tail recursion. Binary search…
For ages, I've been promising to write about finger trees. Finger trees are an incredibly elegant and simple structure for implementing sequence-based data structures. They're primarily used in functional languages, but there's nothing stopping an imperative-language programmer from using them as…
Suppose you've got a huge graph - millions of nodes. And you know that it's not connected - so the graph actually consists of some number of pieces (called the connected components of the graph). And there are constantly new vertices and edges being added to the graph, but nothing is ever removed…

This is one of my favorites to teach my electrical students. In our case, we use this to perform analog to digital conversion via a successive approximation register (SAR). You basically resolve the value one bit at a time. To get their interest I explain a simple bar bet: "Think of a number from 1 to 1000. I'll bet I can guess your number within 10 guesses and all you have to do as tell me whether each guess is correct, too high, or too low." To the simplistic, it sounds like you only have a 1:100 shot, so you can get great odds. The whole 10 guess/1 through 1000 thing sounds so very natural, but then I explain that 2 to the 10th is 1024 and off we go, knocking a bit at a time, from MSB to LSB.

Of course, I caution them not to try the bet if either A) They are not good at arithmetic in their head, or B) They've been drinking heavily.

I'm not sure if Newton's method can really be viewed as a variant on binary searching. You really don't narrow down the search space, since the orbit of the Newton iteration can actually be quite complicated and move very far from the attractor (the root) before converging. Sure, if you start with a good estimate to begin with convergence is more or less monotonic, but in general it's not quite as nice as most people seem to think.

I once had a teacher who made a bet he could guesse the number I was thinking of in 10 guesses (between 0 and 1000). He lost since I picked pi+1.....

I would have to agree with John on this one. I always thought of binary search as more analogous to the bisection method than Newton's method.

By madsocialscientist (not verified) on 07 Apr 2007 #permalink

As a person who was always pretty close to computer HW design, I have never felt binary search is optimal (on real hardware). Consider the following algorithms for searching a list of order N.

(1) Linear search, complexity O(N).
(2) Linear search on an interval of size approx sqrt(N), followed by linear search in the revealed interval. O(N**.5)
(3) Linear search on an interval of size approx N**(2/3), followed
by linear search on intervals of size approx N**(1/3), followed by linear search. O(N**(1/3))
(4) continue this far enough and you should get close to order ln(N)

So we don't do fewer comparisons than BS, but the fact that we know the indices of several elements to search allows us to use some low level parallelism, that we wouldn't have in BS. The result is that the constant of proportionality can be considerably smaller than for BS.
Because of the disruptive effect of sequential branching, and lack of predicatability of addresses for prefetching algorithms like BS can only run at very low Instructions Per Clock. Traditional Computer Science tends to assume that only Operations count matters, but in fact the relative cost per operation can vary by a huge amount in modern computer architectures. Designing algorithms, that support hogh Instructions per clock, as well as (relatively) low expected Op counts can pay handsomely.

As Bentley points out in his book, you can avoid getting binary search wrong by using invariants.
Let's suppose, for example, that you have a sequence A indexed by the range [0..n-1], and you want to find the lowest index i such that A[i] >= x. (This is a fairly typical binary search scenario.)
So you want to maintain two positions lo and hi such that A[lo] < x and A[hi] >= x. (Be careful with the less thans and greater thans; the idea is to ensure that the position that you're interested in is always bracketed.)
Now that you're done with that, you can write the skeleton:

binary_search
  lo = 0
  hi = n+1
  repeat
    assert(A[lo] < x && A[hi] >= x)

    let mid = (lo + hi)/2
    if A[mid] < x then
      ?
    else if A[mid] > x then
      ?
    else if A[mid] = x then
      ?
    endif

    assert(A[lo] < x && A[hi] >= x)
  until lo >= hi
  return ?

Now we just fill in the question marks.
At each stage, we want to move either hi or lo to the tightest bound possible that still maintains the invariant.
If A[mid] < x, then setting lo = mid will maintain the invariant. Verify that lo = mid-1 or lo = mid+1 may either break the invariant or not be the tightest bound possible.
If A[mid] > x, then setting hi = mid will maintain the invariant. Verify that hi = mid-1 or hi = mid+1 may either break the invariant or not be the tightest bound possible.
I if A[mid] = x, then setting hi = mid will maintain the invariant. Verify that hi = mid-1 or hi = mid+1 may either break the invariant or not be the tightest bound possible. Verify that moving lo won't help.
Finally, consider the return value. We wanted to find the smallest index i such that A[i] >= x. We've maintained the invariant that A[hi] >=x, so just return hi.
The final code:

binary_search
  lo = 0
  hi = n+1
  repeat
    assert(A[lo] < x && A[hi] >= x)

    let mid = (lo + hi)/2
    if A[mid] < x then
      set lo = mid
    else if A[mid] > x then
      set hi = mid
    else if A[mid] = x then
      set hi = mid
    endif

    assert(A[lo] < x && A[hi] >= x)
  until lo >= hi
  return hi

By Pseudonym (not verified) on 07 Apr 2007 #permalink

I believe that Mark was using "Newton's Method" to describe not what is traditionally called Newton's method, but to describe a variant of the bisection method that uses Newton's method to narrow the interval more efficiently than simply cutting it in half each time.

I'll see if I can dig up a reference.

The part about using binary search to find a specific code change in the revision history is amusing. Save your self some pain, upgrade to the latest version of P4 and use the time lapse view instead.

Hi Mark, you have a gift to explain math to people with various backgrounds. There are indeed quite a few traps in implementing binary search. For example the pseudo code in Pseudonym's comment is subject to integer overflow.

I'm not sure if Kolmogorov complexity theory is applicable for the "optimal number of comparsions" for search of an ordered random addressable sequence. But I can show that using golden section search (0.618 method) the upper bound is natural log. And we know loge(N) < log2(N) for N > 1.

basic question:
what will happen if length(L) == 1?

By Anonymous (not verified) on 09 Apr 2007 #permalink

@Bill Mill. I meant if the algorithm is implemented in major programming languages the way the pseudo code is specified.

I have to agree with big Tom that, the difference between theory and practice is that, in theory, they are the same.

While in theory BS is the fastest, it assumes you know nothing about the search set. But, generally speaking, you do. So, hashing can help you out alot, for example. And big Tom is saying that in practise, Order log N doesn't describe the problem's complexity. To do a compare, you have indexing cost, and compare cost. And the cost to get to index + 1 is less than index (a + b) / 2. So if (a + b) / 2's cost minus index + 1 cost is the cost of six compares, so when (b - a) gets down to six, you should just search the list from a to b. If six is a small enough number, you might just say it's easier to continue with BS. But often six is huge.

As Spock said: "Logic is the beginning of wisdom" (ST2?).

A post by Pseudonym is a great example of how everybody gets it wrong even when they try to explain how not to get it wrong.