There’s a kind of graph which is very commonly used by people like me for analysis applications, called a lattice. A lattice is a graph with special properties that make it

extremely useful for representing information in an analysis system.

I’ve mentioned before that you can use a graph G=(V,E) to describe a partial order over its vertices, where given two vertices a,b∈V, a<b if and only if either (a,b) ∈E, or there exists

some vertex c∈V such that a<c and c<b. If this is true, G is called a partial order graph (POG).

If we look at POGs, we can create a kind of completion property for describing certain POGs, without

causing them to become total orders, called a greatest lower bound property. Suppose we’ve got two

vertices a and b, where a is *not* less that b, and b is *not* less than a. In a graph with

the GLB property, there is some set of vertices which are the *largest* vertices according to the

graph partial order, and which are less than both a and b. This set, called the greatest lower bound set (GLB set) of a and b will consist of vertices that are not ordered relative to each other in the graph partial order. For example, if you look at the graph to the right, you can J and C are both GLBs of A and B – but J is not less that C, and C is not less than J. They’re not comparable. But they, in turn, have their own GLB – D is the GLB of C and J.

Given a POG P, if for any any pair of vertices a and b in P, there is exactly one member in GLB(a,b), then the graph is called a *semi-lattice*. The example graph above is *not* a semi-lattice: the GLB(A,B) contains two elements. The graph to the right *is* a semi-lattice.

There’s a symmetric notion – the least upper bound. You can guess it’s definition. If a graph

has both a LUB and a GLB, then it’s called a lattice.

Why are lattices so important?

A lot of the work that I’ve done has been on program analysis of one kind or another. Program

analysis is pretty typical of how lattices are used. When you’re doing an analysis of some kind on a program, at some location in the program, you have some knowledge about the properties of the program

at that point. Then you reach a conditional. At the end of one branch of the conditional, what

you know can be represented as some value T. At the end of the other branch of the conditional, what

you know can be represented as some value F. What do you know *after the two branches join*, at the first statement after the conditional? What you know is the *greatest lower bound* of T and F.

Suppose that you know that statement A and statement B are completely independent. After A, you know some set of facts F_{A}; after statement B, you know some set of facts F_{B}. What

do you know after both A and B? You know the *least upper bound* of F_{A} and

F_{B}.

For a program analysis to be sound and conservative – which is a fancy way of saying “absolutely sure that its results are correct” – the set of values that can be produced by the analysis must, at least, be a semi-lattice.

To give you a better sense, let’s look at an example.

def foo(a,b): x = 13 y = x*2 z = x * a if (b < z): z = y*z else: x = sqrt(y) q = x/11 return z

Suppose we want to see what we can replace with constants in the code above. After the first

statement, X is a constant: {Const(x,13)}. After the second statement, we can say that y is a constant,

too – it’s a constant times a constant. {Const(x,13), Const(y,26)}. After the third statement, we’re x and

y are still constant; z isn’t: {Const(x,13), Const(y,26), Var(z)}.

Now we start the conditional. In the true branch, we end up with: {Const(x,13), Const(y,26×26),

Var(z)}. In the false branch, we wind up with {Const(x,sqrt(26)), Const(y,26), Var(z)}.

After the conditional, what do we have? It’s not just something like set union, or set intersection,

or even set difference. It’s something specific to this kind of information – the lower bound of the two

branches. We lose X as a constant – because it could be one of two different constants, based on the value

of the parameter b, which we don’t know. What we end up with is {Var(x),Const(y,26),Var(z)} – the lower

bound of the two branches in the semi-lattice.

This is part of a common, general pattern in all sorts of analysis, from programs, to planning, to

knowledge representation. When we combine two sets of information, we can either know that both are true

(in which case we combine them using the upper bound), or we can know that one *or* the other is

true (in which case we combine them using the lower bound). By making sure that the values form

a lattice, and using the lower and upper bounds in the lattice to combine information, we know that our results are safe in the sense that we’ll never claim to know something that we don’t.

What that means for program analysis, in practical terms, is that we can perform optimizations – make changes to the program to make it more efficient – based on information from the analysis: if the analysis says that the optimization won’t change the results of the program, then we can be absolutely sure that it won’t. We call that being conservative: we only do things that we can be absolutely sure

won’t change the meaning of the program. That’s obviously a pretty critical thing in a compiler. Proving that the set of values used by the analysis form a lattice is by far the easiest way of showing that the analysis is conservative.