# A Solution to the 17-Clue Problem?

In honor of the publication of the BSB (that’s the Big Sudoku Book, for those not up on the local slang), my coauthor, Laura, and I hosted a session at last week’s Joint Mathematics Meetings about the mathematics of Sudoku. I gave the opening talk in the session, an overview of some interesting mathematical questions that arise naturally from thinking about Sudoku. Of course, I had a slide discussing what, until recently, was considered the biggest open problem in this area: What is the minimum number of clues a sound puzzle can have?

Of course, everyone knew the answer was 17. After all, plenty of 17-clue puzzles were known, but despite extensive computer searching no one had ever found a 16-clue puzzle. Still, since persistent failure to find a counterexample does not count as a proof, the problem was open. I had a slide in my talk saying as much.

My talk was scheduled for the early afternoon on Wednesday, last week. On Tuesday night, while I was on the train heading towards Boston (the site of the meetings), I decided to check my e-mail. And what did I find but an e-mail from a fellow named Gary McGuire, at University College, Dublin, had announced a proof that 17 is, indeed, the correct answer.

Laura had had an opportunity to skim the paper, which is coauthored with Bastian Tugemann, and Gilles Civario, and I had an e-mail from her as well telling me to take this seriously. The following morning, mere hours before my talk, I had a chance to browse through it. It was clear very early on that this was serious work, and that the method being used was very plausible. I think there’s a pretty good chance the solution will hold up, but, of course, quite a lot of checking will be necessary before it is accepted as correct.

So how did they do it? Well, by a brute force computer search, actually. That’s not the clever part, though. People had thought of doing brute force searches before, but the search space just seemed to large for that to work. If we think in the crudest possible way, then we might imagine selecting a completed Sudoku square and searching all of its 16-clue subsets looking for a valid puzzle. That would mean checking

$\binom{81}{16} \approx 3.4 \times 10^{16}$

subsets, for each of the roughly 5.4 billion Sudoku squares. (The actual number of Sudoku squares is much larger, on the order of ten to the twenty-second power, but when you take into consideration the various symmetries that convert one square into another, the number comes down into the billions.) Such large searches are simply intractable.

The authors used the idea of an “unavoidable set.” A set of cells in a Sudoku board is said to be unavoidable if any valid puzzle having that board as a solution must have a starting clue taken from that set. You can find an example at this site. Finding unavoidable sets dramatically shrinks the space you must search in looking for valid 16-clue puzzles. Instead of examining every subset of size 16, you only need to look at the ones possessing at least one cell from each of the unavoidable sets.

This leads to the “hitting set” problem in combinatorics. It comes in a number of forms, but essentially the problem is this: Imagine that you have two lists. On the left, you have a list of individual elements. On the right you have a list of sets with the property that each element appears in at least one of the sets. Your problem is to find the smallest collection of sets such that every element from the left-hand list appears in one of the sets.

The hitting set problem is known to be computationally hard. The real novelty in what McGuire, Tugemann and Civario accomplished was in devising efficient algorithms for finding unavoidable sets given Sudoku squares, and then locating the relevant hitting sets. for them. These algorithms made it possible to carry out the search in an acceptable amount of time.

Pure mathematics had made very little progress on this problem. It is trivial to see that 7 clues is too few for a sound puzzle. With only seven clues there must be two digits entirely unrepresented at the start of the puzzle. Let’s say those digits are 1 and 2. Then in any solution from that starting point, we could always interchange all of the 1s and 2s and have another valid square. So 7 is insufficient. But it is already difficult to give a purely logical reason for thinking the same about 8.

So it might be best to view the 17-clue problem as a question in computer science rather than mathematics.

At any rate, here’s a a brief squib about the proof over at Nature. With the chaos of the start of the semester around here I don’t know when I will have a chance to read the paper carefully, but it is certainly an exciting development.

1. #1 Mark Erickson
January 10, 2012

Exciting being in the eye of the Sudokur.

2. #2 Another Matt
January 11, 2012

I’ll be very happy if the study pans out. I’m a big fan of constrained brute-force searches. Lately I have been poking around a conjecture in music theory involving hamiltonian cycles that I don’t think will be solved without searching the space, but it’s a very big space that requires some clever pruning. So it is “an exciting development” when this kind of thing turns up.

3. #3 Muck Dizisi
January 18, 2012

I’m a fan of constrained brute-force searches.
Exciting….

4. #4 Living Healthy
January 18, 2012

Thanks for sharing….

5. #5 David Winfrey
January 19, 2012

That sounds like a data compression problem; you’re trying to find the smallest lossless compression of an 81-element set, given some rules on how it was created.

Are the puzzles that are most difficult for humans to solve also the most difficult for software to solve? I’m writing a solver in C, and it works on the first and last puzzles in Tom Sheldon’s “Sudoku Genius.” Is there any better source of test puzzles for machine solution?

6. #6 David Winfrey
January 21, 2012

1 2 3 4 5 6 7 8 –         1 2 3 4 5 6 7 8 9
– – – – – – – – –         9 4 5 1 7 8 2 3 6
8 – – – – – – – –         8 6 7 2 3 9 1 4 5
7 – – – – – – – –         7 1 2 3 4 5 6 9 8
6 – – – – – – – –         6 3 4 8 9 1 5 2 7
5 – – – – – – – –         5 8 9 6 2 7 3 1 4
4 – – – – – – – –         4 5 1 7 8 2 9 6 3
3 – – – – – – – –         3 7 6 9 1 4 8 5 2
– – – – – – – – –         2 9 8 5 6 3 4 7 1

Does this count as a 14, or are the three missing
clues too trivial? (Or is the solution not unique?)

7. #7 David Winfrey
January 21, 2012

It’s not unique; the solution varies when the sequence of the trial-and-error part of the program is changed.

8. #8 Bayesian Bouffant, FCD
January 23, 2012

Seventeen, wow! You should share with the Discovery Institute, they don’t have a single clue.