Now that we've gone through a very basic introduction to computational complexity, we're ready

to take a high-level glimpse at some of the more interesting things that arise from it. The one

that you'll hear about most often is "P vs NP", which is what I'm going to write about today.

So, what are P and NP?

P and NP are two very broad classes of computational problems. When we're talking about P and NP, we're talking about the intrinsic complexity of a *problem* - that is, a minimum complexity bound on the growth rate of the worst case performance of *any* algorithm that solves the problem.

P is the class of problems that can be solved in polynomial time on a deterministic effective

computing system (ECS). Loosely speaking, all computing machines that exist in the real world are

deterministic ECSs. So P is the class of things that can be computed in polynomial time on real computers.

NP is the class of problems that can be solved in polynomial time on *non-deterministic*

ECSs. A non-deterministic machine is a machine which can execute programs in a way where whenever there are multiple choices, rather than iterate through them one at a time, it can follow *all choices/all paths* at the same time, and the computation will succeed if *any* of those

paths succeed; if multiple paths lead to success, one of them will be selected by some unspecified mechanism; we usually say that they'll pick the first path to lead to a successful result.

The idea of non-determinism in effective/Turing-equivalent computing systems is a tough

thing to grasp. There are two ways of thinking about it that help most people understand what it

means: there's the low-level explanation, and the high level explanation.

The low level explanation is based on the basic concept of the Turing machine. (If you need

a reminder of how a Turing machine works, I wrote an explanation here.) In a Turing machine, you've got a tape full of data cells, with a head that can read and/or write data on

the cell under it, and you've got a *state*. Each step of the computation, you can look at the contents of the tape cell under the head, and then using the data on that cell and the current state of the machine, and use those to decide what (if anything) to write onto the current tape cell; what state to adopt, and what direction to move the head on the tape. In a deterministic Turing machine, for a given pair (cell,state) of tape cell data and machine state, there can be only one possible (write,state,move) action for the machine. In a non-deterministic machine, there can be *multiple* actions for a given (cell,state) pair. When there are multiple choices, the machine picks *all of them*; you can think of it as the machine splitting itself into multiple copies, each of which picks one of the choices; and whichever copy succeeds first becomes the result of the

computation.

The high level explanation is a simpler idea, and it's closer to what we use in practice for

specifying when some problem is in NP, but it sounds pretty weird. A problem whose solution can

be computed in polynomial time on a non-deterministic machine is equivalent to a problem where

a proposed solution can be verified correct in polynomial time on a deterministic machine. You can

specify a non-deterministic machine that guesses all possible solution, following all of those paths at once, and returning a result whenever it finds a solution that it can verify is correct. So if you

can check a solution deterministically in polynomial time (not produce a solution, but just verify that the solution is correct), then it's in NP.

The distinction can become much clearer with an example. There's a classic problem called the

subset/sum problem (SS). In the SS problem, you're given an arbitrary set of integers. The question is,

can you find any non-empty subset of values in the set whose sum is 0? It should be pretty obvious that*checking* a solution is in P: a solution is a list of integers whose maximum length is the size

of the entire set; to check a potential solution, sum the values in the solution, and see if the result

is 0. That's **O**(n) where n is the number of values in the set. But *finding* a solution is

hard. The solution could be *any* subset of any size larger than 0; for a set of n elements,

that's 2^{n}-1 possible subsets. Even if you use clever tricks to reduce the number of possible

solutions, you're still well into exponential territory in the worse case. So you can non-deterministically guess a solution and test it in linear time; but no one has found any way

of producing a correct solution deterministically in less than **O**(2^{n}) steps.

One of the great unsolved problems in theoretical computer science is: does P = NP? That is, is

the set of problems that *can* be solved in polynomial time on a non-deterministic machine the same as the set of problems that can be solved in polynomial time on a deterministic machine? We know for certain that P ⊆ NP - that is, that all problems that can be solved in polynomial time on a deterministic machine can also be solved in polynomial time on a non-deterministic machine. We simple do not know.

For computing devices simpler than Turing-equivalent, we know the relationships between the deterministic and non-deterministic problem families. For example, for finite state machines,

non-deterministic FSMs are no faster or more capable than deterministic ones (So DFA = NFA); for pushdown automata, the deterministic versions are strictly weaker than the non-deterministic ones (D-PDA ⊂ N-PDA). But what about Turing machines, or any other Turing-equivalent ECS? Not a clue. I know people who study computational complexity who think that P=NP, and that we will find a polynomial time deterministic solution to an NP problem any year now. Personally, I suspect that P ≠ NP,

based on intuition from the PDA case; as computing devices get more complex, we start to see more

differences between the deterministic and non-deterministic machines, and I personally think that that

will hold true all the way up to full ECS's. But I have no proof to back that up, just intuition.

Within NP, there are a set of particularly interesting problems which are called *NP-complete*. An NP-complete problem is one where we can prove that if there is a

P-time computation that solved that problem, it would mean that there was a P-time solution

for *every* problem in NP, and thus that P=NP. We've identified hundreds (thousands?) of

NP-complete problems, but no one has ever found a P-time solution for any of them.

The SS problem mentioned above is NP-complete. So are a lot of other interesting problems:

graph coloring, the traveling salesman, clique detection in graphs, and many more.

So how do we show that something is NP complete? It sounds like a very difficult thing to do. And doing it from first principles *is* extremely difficult. Fortunately, once it's been done for one problem, we can piggyback on that solution. The way to prove NP-completeness is based on

the idea of *problem reduction*. If we have two problems S and T, and we can show that

any instance of S can be transformed into an instance of T in polynomial time, then we say that S is polynomial-time reducible to T. If T is NP-complete, then *every* problem in NP is polynomial-time reducible to T.

The trick is: once we know a problem T which is NP complete, then for any other problem U, if we can show that T is polynomial-time reducible to U, then U must be NP-complete as well. If we can reduce T to U, but we don't know how to reduce U to T, then we don't just say that U is NP-complete; we say that it's NP-hard. Many people think that if we could show whether or not NP=NP-Hard, then we'd also know whether P=NP; others disagree.

Anyway, there are a lot of problems which have been proven NP-complete. So given a new problem,

there are a lot of different NP-complete problems that you can use as a springboard for proving that

your new problem is also NP complete.

If you trace back, you'll find that most NP-completeness proofs are ultimately built on top of NP-completeness proof of one fundamental problem, whose nature makes it particularly appropriate as

a universal reduction target - and it's also the first problem that was proven to be NP-complete. It's called the propositional satisfaction problem (aka SAT), which was shown to be NP-complete via a rather difficult poof. For any other problem, if we can show that we can translate any instance of a SAT problem *to* an instance of some other problem *in polynomial time*, then that other problem must also be NP-complete. And SAT (or one of its simpler variations, 3-SAT) is particularly easy to work with, and show how to translate instances of SAT to instances of other problems.

The SAT problem is a neat one. Take a set of boolean variables of size *n*,

b_{1},...,b_{n}, and their complements, not(b_{1}),...,not(b_{n});

these are called *atoms*. Now, form an arbitrary logical statement using atoms joined by logical ANDs and ORs. The problem is, can you find an assignment of the boolean variables to true and

false so that the statement is true (satisfied).

The only problem with SAT is that it's so general. The set of all logical equations consisting of

an arbitrary set of atoms joined together in any syntactically valid form - it's an enormous set; but worse, it's got a huge amount of structural variation. To work with it, and show a P-time reduction from SAT to some other problem, and be absolutely sure that you've covered every possible case - that's a tough job. Too tough; it's awfully easy to miss just one little case; and missing one case can

give you an apparent reduction that shows P=NP.

What we typically do, then, is used a constrained version of SAT. Any arbitrary SAT equation can be

P-time transformed into a equivalent equation in a particular form consisting of groups of 3 atoms

(triples), where the members of a triple are joined by logical "OR"s, and triples are joined by logical

"AND"s. Solving the SAT problem for equations in the constrained triple form is called 3-SAT. 3-SAT is NP-complete, just like the general SAT problem; and because it is so constrained, 3-SAT is a much easier basis for constructing reduction-based NP-completeness proofs.

- Log in to post comments

After this nice theoretical exposition, one can

practice the concepts in this quiz:

You 've got your troubles I've got mine.

People prefering the topology posts might try:

Farmer Ted and his sheep.

Concerning SAT

the first problem that was proven to be NP-complete. It's called the propositional satisfaction problem (aka SAT), which was shown to be NP-complete via a rather difficult poof.

I think the first point arguable, though it depends whether you consider the "Does a given non-deterministic Turing machine have an accepting computation within a polynomial time bound on given input" is a "problem", or simply a restatement of the definition!

And having taught the reduction a number of times, I'd like to take issue with "difficult". The main difficulties with most of the proofs (up to and including current textbooks) are:

the mathematician's choice of variable names for the propositional variables (and I'm allowed to say that being one myself) as opposed to naming them for what they stand for; and

the failure to mention that the idea of the proof is to show that we can specify "snapshots" of a TM using propositional variables.

No, there are definitely problems that are NP-hard but not in NP, in particular many problems requiring super-exponential time. NP-complete = NP-hard and in NP.

Mark, I find these two sentences contradictory. In one you refer to checking a solution; in the other to finding a solution.

I think this article never makes clear if you are talking about checking or finding solutions. I found it confusing in that regard.

MiguelB:

I'm not sure if I wrote unclearly, or if you're misinterpreting it; when I read it, it seems clear to me, so if you have some idea of how I can clarify, or point out what's specifically making it hard to follow, let me know, and I'll fix it.

In the article, I'm talking about

findingsolutions to the problems. But any problem that can besolvedin polynomial time on a non-deterministic machine (i.e., a problem that is in NP) is also a problem whose solutions can becheckedin polynomial time on a deterministic machine. That's why the verification/checking thing comes up: an NP problem is a problem that can besolvedin polynomial time on a non-deterministic machine, andcheckedin polynomial time on a deterministic machine.Does that help?

That helps. I guess I found this phrase a bit tough to parse (I needed to read it thrice):

I would probably rephrase it somewhere along the lines of "If veriftying that the solution to a problem takes polynomial time on a deterministic machine, then that problem is NP". It might seem similar but it makes clear what "it" is in "..., then it's in NP."

BTW, thinking more about this, if that implication is true, then what about P problems? How long does it take a deterministic machine to verify a solution to a P problem? Making a quick guess I'd say it'd probably take O(1) but I'm not sure.

Maybe I'm wrong, but I believe NP-completeness relies on 1) showing that your problem is no easier than an NP-complete problem (reduction of SAT/whatever into your problem), and 2) showing that your problem is no more difficult than an NP-complete problem (reduction of your problem to SAT/whatever).

With that said, I'm not quite sure that it will be as easy to show that P=NP or that P!=NP as some claim. Why?

If we think of an algorithm to solve an NP-complete problem as being a tree of decisions, similar to the way we think of the N! orderings of a list (and the comparison-based lower bound as O(log(n!)) ), then we can come upon an obvious EXPTIME solution to the NP-complete problem: traverse the decision tree until you come to a solution.

Modern techniques for solving NP-complete problems (at least that I've seen), tend to rely on heuristic prunings of the decision tree in order to skip over portions that may not lead to a solution (depending on the algorithm, this may be more or less strict).

Showing that P=NP would be, from what I understand, equivalent to saying that there exists a universal heuristic for pruning. If this is correct, then it doesn't seem difficult to construct an NP-complete problem for which the "universal heuristic" fails in an inconsistent manner (it's not always right OR wrong).

Regardless, P=NP would imply that inverting any cryptosystem is arbitrarily easy; from prime factorization in RSA, to key inversion in AES. All become easy. Such a simple solution being claimed as correct for entire universe of problems, smells to me a bit like ID (no offense to your friend). Would it be convenient? Of course. But so would ID.

As for P!=NP, I think that is also going to be difficult, for similar reasons.

Miguel:

A problem that can be

verifiedin P-time on a deterministic machine can also be verified in P-time on a non-deterministic machine. There's no general reduction in complexity class between deterministic and non-deterministic; if there were, we'd probably have the answer to whether or not P=NP.Mark,

I guess what I'm asking is this:

You said that "If solution verification is P, then problem is NP".

Is the converse true? In other words, is it true that:

"solution verification is P iff problem is NP"

If the converse is not true, then verification of a solution to a non-NP problem can't be in P. I guessed it'd be O(1).

I hope I'm making sense.

Miguel:

Yes, part of the definition of NP is basically that a problem is in NP if/f its solution is verifiable in polynomial time. The implication works both ways; if you know you can verify the solution in polynomial time, then you know that the problem is in NP; if you you know that the problem in NP, then you know that you can verify a solution in polynomial time.

Also, a problem with a complexity of O(1) is in P. (1 is considered a polynomial in x0.)

Mark,

OK, that's interesting. Thanks for the explanations.

I didn't know that, but it makes sense. One question remains, though: how long does it take to verify the solution to a P problem? If it's not polynomial time, what's left?

Just pointing out the obvious: if verifying the solution to a P problem is P too, and Mark is right, then P=NP.

Sorry for replying to myself here... I just realised every problem in P is also in NP, so the time to verify the solution to a P problem can be in P too, without any consequence.

Maybe it'd be worth it to point out in the article that P is a subset of NP? For those of us who can't immediately figure it out...

MiguelB: The article does point out that, "We know for certain that P â NP - that is, that all problems that can be solved in polynomial time on a deterministic machine can also be solved in polynomial time on a non-deterministic machine."

Rory,

Yeah, thanks for pointing that out. When dealing with a complex subject, I only start to fully comprehend it when I turn it around in my head and repeat by myself what has just been explained (and by then I've forgotten it was already explained to me). It's hard to describe the process. I'm sorry for doing it in public :) -- on the other hand, this is the first time I feel I have a grasp on the subject. Thanks.

Verifying the solution to a P problem is in fact only guaranteed to be in P.

And P is a subset of P, but it is not known if P is a proper subset of P. (that's the whole P=NP/P!=NP question.)

Also, a minor note.

As it's known that any problem in NP can be reduced in polynomial time to an NP-Complete problem, proofs of NP-completeness typically skip showing that it's possible to reduce problem X to a known NP-Complete problem, and simply show it's in NP.

It's usually much easier to "reduce" a simple problem to a complex one than it is to go the other way around... but it's very easy to show something's in NP. (Just display a polynomial-time algorithm to verify solutions. This is usually pretty trivial for problems we care about.)

Now, the fun part is when you consider that we know

P â NP â PSPACE â EXPTIME

We know P != EXPTIME, so AT LEAST ONE of these âs in fact denotes a proper subset.

But we don't know which one!

"Many people think that if we could show whether or not NP=NP-Hard..."

NP is not equal to NP-Hard. The Halting problem is NP-Hard. It is definitely not in NP.

Also, to prove NP-Completeness, you don't need to do two reductions (although you can). You can reduce SAT or whatever to your problem, then show that your problem is in NP. That's often pretty easy: just show that the answer is polynomial in the length of the input.

"Many people think that if we could show whether or not NP=NP-Hard, then we'd also know whether P=NP; others disagree."

It occurred to me that maybe you meant NP = NP-Complete. In that case, if P != NP, then there are problems that are in NP but not NP-Complete, so NP != NP-Complete (Richard Ladner proved this in 1975 in the Journal of the ACM). Therefore if NP = NP-Complete, then P = NP. No disagreements allowed!

One interesting thing that is hidden in the NFA = DFA equality is the complexity of the machine description.

For any regular expression, there is an NFA whose description is linear in the size of the regular expression. (In fact, there's one with exactly the same number of states as there are symbols in the original regular expression, plus one.) However, there are regular expressions such that the number of states of the minimal DFA is exponential in the size of the regular expression.

Why is this interesting? Because while NFAs and DFAs have equal power, but NFAs are not polynomially reducible to DFAs.

Mark, another comment (great series, by the way). There are fairly complicated classes for which non-determinism makes no difference. These are "space" classes, for example the class of all problems solvable in polynomial space. An old result shows that for any space class, non-determinism doesn't change the power, and so PSPACE (poly-space) = NPSPACE (non-deterministic poly-space) and so on.

It is indeed usually easy to check that a problem is in NP, but there are some well known problems for which this is not true at all. Often, these problems involve calculations with things like square roots, and so verifying that a solution is a certain value is nontrivial.

Also, optimization problems in general are not known to be in NP, because it's difficult to verify in polytime that a given solution is "optimal"; they are often NP-hard, but that's as far as we can go.

Great knowledge-in-the-gap article, reducing and SATisfying and more. Thanks!

A distinction that I've heard people talk about, which I'm not sure is exactly the same as the P vs NP distinction is the difference between finding a solution to a problem and checking a solution to a problem. For example, it's harder to figure out the solution to x^5 - x^4 - 16 = 0 than it is to verify that x=2 works. For a hard theorem of mathematics, it's much more difficult to come up with a proof than it is to check to see if the proof is correct. In chess, it's a lot harder to figure how to force a checkmate in 5 moves than it is to check that a particular sequence of 5 moves produces a checkmate.

If P=NP, then that means for many hard problems, there really is no big distinction between producing a solution and checking a solution.

I'm wondering if you're going to get into quantum complexity here, or if that's to the side of your real interest. My intuition is that NP is in QP, but there's yet to be a proof. Full disclosure: the problem that kicked off my dissertation would have had this proof as a corollary, but it turned out not to work as I'd hoped.

Suresh: Optimization problems and the definition of NP don't really map well to one another.

But almost any optimzation problem has a simple restatement as a decision problem (which is what NP is really about); If you have an optimization problem that optimizes some variable V given an instance I, you can recast it as a decision problem by saying "Is there any solution to instance I that has a value of V no worse than [insert number here]."

Mark, I'm new to commenting on this blog, but I can't understand how the solution to the Traveling Salesman problem you linked to can be checked in P. This problem could be trivially checked in P and therefore NP-complete if you specify it as "Is there a route of length less than x" rather than "what is the minimum route". Am I missing something?

Mark,

I've been regularly reading your blog for many months now, great work and thanks for writing, I always learn something new! In regards to NP-completeness, I've been independently studying the derivation of the class of NP-complete problems, namely via Cook's SAT reduction(mostly from textbooks like Hopfcroft & Ulhman, Garey & Johnson, etc). You mention that it is a rather difficult proof, and I agree--I am quite stuck on it! But I feel like the only way to really understand NP-completeness is by intuitively knowing why/how SAT is the hardest NP problem, which is really the first step to take in this subject.

Can you or any of the readers here recommend any papers, sites or books that provide an unusually lucid or novel walkthrough of Cook's "SAT is NP-complete proof"? Everything I've found so far, even online lectures from various math depts., don't do a good job of presenting the proof and the step by step reasoning to get through it. I would even request for you to do an upcoming post on taking this proof apart.

Thanks for the explanation. It's been a while since college, and we never did go into NP-completeness.

I suppose one trivial corollary is that if we have a problem M known to be (at worst) NP, and can in polynomial time enumerate all possible solutions to M, then we can, in polynomial time, iterate over the set of all possible solutions and verify them. So M is really P.

From the article: Personally, I suspect that P â NP, based on intuition from the PDA case; as computing devices get more complex, we start to see more differences between the deterministic and non-deterministic machines, and I personally think that that will hold true all the way up to full ECS's.

It is known that PSPACE == NPSPACE, and those classes are at least as complex as P.

----------------

From lylebot: It occurred to me that maybe you meant NP = NP-Complete.In that case, if P != NP, then there are problems that are in NP but not NP-Complete, so NP != NP-Complete (Richard Ladner proved this in 1975 in the Journal of the ACM). Therefore if NP = NP-Complete, then P = NP. No disagreements allowed!

--

NP != NP-Complete.

Consider the problem "yes", for which the answer is always yes. This set is obviously in P and NP, but cannot be NP-Complete, because no problem which can be answered "no" can be reduced to it.

P=NP. Professor Martin Michael Musatov. "If smart people always had Ph.D.'s we would not have light bulbs."

Great! Prove it.

What I've found with regards to that problem.

I've been studying the 255 main rules of cellular automaton with a twin bifurcation based numerical system. What I mean by twin Bifurcation is where you have two numbers produced from a Piecewise Equation that as a pair are unique for every value of n.

So for example we'll call this pair x and y

so when n = 1,2,3,4,5,6,7,8,9 ...

x = 1,0,1,2,1,0,1,2,3,1,0 ...

y = 1,2,2,1,3,3,4,2,1,5,4 ... (y being the occurrence count of the pyramidal sequence x)

Any way If I if tell the computer to find the steps for a cellular automata rule between 1 and 255 and I ask it for a range of steps starting at the initial condition being n represented in binary form I find that the X of this outcome - the Y of this out come produces a re-occurring numbers sequence for regardless of which rule is chosen or which n value in binary form acts as the initial value.

This is rather off the wall and odd but true.

When I take this x - y of the each step result pattern and minus that from the true step value that is where I find a different pattern that look much like variable axis in some way.

So as a result of using this twin bifurcation I am somehow splitting the problem into a repetitive sequence and something else. If of course I could mathematically reproduce the result of that something else I could take a the repetitive sequence and simply add that to my reproduction and the two numbers when represented in binary would be equal to the result of any given step.

It's early days by that's where i'm looking.

My recursive sequence is on youtube I am the mad guy wildchildplasma but I have indeed found that recursive sequence result of the steps.