Weighted Graphs and the Minimum Spanning Tree

By goodmath on August 1, 2007.

Moving on from simple graphs, one of the things you can do to make things more interesting
is to associate numeric values with the edges, called the weights of those edges. This variation
of a graph is called a weighted graph.

Weighted graphs are extremely useful buggers: many real-world optimization problems ultimately
reduce to some kind of weighted graph problem. A few examples include:

The traveling salesman problem (TSP): given a set of cities, and information about travel-times
between pairs of cities, find the shortest route that visits all cities. This is actually an
incredibly commonly used problem; for example, in large data centers which manage backups,
a set of backup-restore requests is served by computing the seek time between pairs
of requests (where seek time includes finding the correct tape, loading it into a drive, and
scanning to the correct point on tape), and then finding the optimum order in which to fulfill
the requests by using a TSP algorithm
Shortest path problem: a subset of the TSP; given a set of cities and information about
travel-time between cities with direct transportation routes, find the shortest path between
two cities.
The minimum spanning tree problem: given a weighted graph, find the spanning tree
with minimum total edge-weight. Airlines use minimum spanning trees to work out their basic
route system: flights between hubs are low-cost; other flights have varying prices depending on how many people fly them, what airplanes the airports can support, fuel transport costs, etc. The best
set of routes is the minimum spanning tree.

Formally, a weighted graph is defined by a triple: (V, E, W), where:

V is a set of vertices;
E is a set of edges {v,w} where v,w∈V.
W is a map from edges to numbers. (Depending on the specifics of a problem,
the numbers might be restricted to positive numbers, natural numbers,
integers, ...)

For today, we'll look at the last of those in a bit more detail: finding the minimum spanning tree. This is one of the problems that constantly comes up in infrastructure design - from building layouts, to
network structure to warehouse location. One great example of the use of this is in telecommunications. When AT&T and Sprint were laying fiber optic cable to replace their copper-wire backbone, they
took their primary routing stations, and arranged them in a graph. Then they worked out the cost of
connecting each pair of routing stations by fiber, and applied those costs as edge-weights. Then they
computed the minimum spanning tree of the resulting graph - and that defined the least-expensive
way to lay the fiber for a new backbone.

So let's start by making the definition of the minimum spanning tree problem more precise.

Suppose we have a simple graph, G=(V,E). A spanning tree S_G= (V,E') is a subtree (acyclic subgraph) of G, where E'⊆E.
If G is a weighted graph G=(V,E,W), then the weight of G = Σ_e∈EW(e) - that is, the sum of the weights of all of the edges in the graph.
If G is a weighted graph, then the minimum spanning tree Span(G) is the spanning tree over
G with minimum weight.

Given that just about every interesting graph problem that we've seen has turned out to be NP-complete, you might expect finding a minimum spanning tree to be NP-complete as well. But graphs are often counterintuitive: things that seem like they should be simple turn out to be very difficult; things that look like they'll be difficult sometimes turn out to be simple. In fact, there's a very simple algorithm which is O(n lg n) where n is the number of edges.

It's called Kruskal's algorithm. And it's a really beautiful algorithm - an extraordinarily clear,
simple, easy way of solving the problem. It's based on what's known as a greedy approach. In a
greedy approach, you aggressively grab the thing that looks best, try it out, and see if it fits.

In Kruskal's algorithm, you greedily grab edges - you always grab the lowest weight edge, and see if it
helps build a spanning tree; if so, you add it, if not you discard it; and you keep doing that until
you have the spanning tree. (From that description, you might think that it's really O(N) in the number of edges - because you consider each edge once. But the edges have to be in sorted order, and sorting the edges is O(N lg N).)

So here's the algorithm more precisely:

FindMST(G=(V,E,W))
   let Result = a set of trees, each consisting of one node from G
   let SortedEdges = sort(E,W)
   for each MinEdge in SortedEdges:
      if MinEdge connects two disconnected trees in Result:
         merge the trees in Result by adding MinEdge
      else:
         discard MinEdge
     end if
   end for

Let's walk through an example. Here's a graph with weighted edges:

i-61680e20a14d3e0b042b9002340f0086-mst-initial-graph.png

Now we'll go through Kruskal's algorithm to generate the MST:

So the final weight of the minimum spanning tree for this graph is 31.

More like this

Don't forget Prim (Jarnik) and Boruvka.

Wonder why the old East bloc is so big on algorithms.

FWIW, since this blog so often has occasion to criticize the bad math of creationists, I was curious to see if not spanning trees pops up in biology.

Googling indicates that it does, in descriptions of populations development (evolutionary and/or biogeographically). Cladistics seems to be one obvious application, where for example the descent tree (on population characteristics) with most bayesian likelihood can be searched for.

The important bit to remember with Kruskal's is that you have to be careful about the data structure you use to represent connected trees. If you're having to check for connectivity between trees every time your algorithm is O(|E|2) (I might have to check every edge for each new edge I add). With Union-Find you get that down to O(N log* N) over the run of your algorithm, thus allowing sorting to dominate the running time.

It is for this reason that would tend to recommend Prim's algorithm to people who aren't picky since it has (worst case using and adjacency matrix to search for the lowest weight weight edge connected to the tree) time complexity O(|V|2).

Airlines use minimum spanning trees to work out their basic route system

When AT&T and Sprint were laying fiber optic cable to replace their copper-wire backbone, [...] they computed the minimum spanning tree of the resulting graph - and that defined the least-expensive way to lay the fiber for a new backbone.

Guess I have very little intuition for the real world :-)
I don't understand why this should be so -- while it's obvious that the strictly least-expensive way is a tree, wouldn't it make sense to directly connect some far-apart but often-used airports, if the distance between them (according to the MST) is high? And similarly with the communications network...

Perhaps the answer is that because the graph is actually laid out on a plane and costs are roughly proportional to the Euclidean distance, the "direct" edge between two vertices won't be so much cheaper than the distance given by the MST that it's worth adding the extra (redundant) edge. Is this correct?

Still I find it hard to believe that MSTs would be used... what if one crucial node went down?

Keep in mind, ugstu, that the "weight" of an edge may not be as simple as the distance between the nodes. It could be derived from a complicated algorithm that takes into account distance, money, frequency of use, and all sorts of things.

ugstu:

what if one crucial node went down?

Remember the JetBlue chaos this past winter at JFK when it snowed? JFK is one of JetBlue's hubs - the major branch-point
in their tree.

Most airlines have adopted what they call a hub-and-spoke
routing system. That means that they have a relatively small number of major airports that they work out of, and most of their flights are routed primarily through the hubs. For example, my family is going on vacation to Yellowstone this summer. The only way we can get there is by taking a flight from Newark to Chicago, and then Chicago to some dinky airport near the park. Chicago and Newark are Continental hubs.

The airline's full route isn't a tree - they do augment the route with extra edges between hubs. But the basic routing is an MST, with manual tweaks afterwards.

How is this different than Linear Programing as described by George Danzig? As to the objective function being non-linear, that is called in industry as parametric Linear Programming . How is your algorithm different than a simplex?

pls mail me collection on weighted shortest graph.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Moving on

August 2, 2010

Finally, at long last, I can tell you what I've been up to with finding a new home for this blog. I've created a new, community-based science blogging site, called Scientopia. With the help of many wonderful people, we're ready. We launched this morning. So to continue following GM/BM - along with…

Goodbye, Scienceblogs

July 7, 2010

So my decision is made. I'm closing up around here. I'm in the process of working out exactly where I'm going to go. With any luck, Seed will leave this blog here long enough for me to post an update with the new location. But I'm through with Seed and ScienceBlogs.

Seed, Conflicts of Interest, and Sleaze

July 6, 2010

As my friend Pal wrote about, Seed Media Group, the corporate overlords of the ScienceBlogs network that this blog belongs to, have apparently decided that blog space in these parts is now up for sale to advertisers. We've been advertiser supported since I joined up with SB. I've never minded…

Searching for Topics

June 28, 2010

As regular readers have no doubt noticed by now, posting on the blog has been slow lately. I've been trying to come back up to speed, but so far, that's been mainly in the form of bad math posts. I'd like to get back to the good stuff. Unfortunately, the chaos theory stuff that I was…

Saturday Recipe: Ginger Scallion Sauce

June 26, 2010

Today's recipe is something I made this week for the first time, and trying it was like a revelation. It's simple to make, it's got an absolutely spectacularly wonderful flavor - light and fresh - and it's incredibly versatile. It's damned near perfect. It's scallion ginger sauce, and once you try…