A minimal Sudoku solver

What you will find here:

a self-contained, half-page implementation of a Sudoku solver (try it out on Repl!)
Code to print beautiful boards such as the one shown above ;)
A rambling explanation of what I mean by "minimal".

How many strategies are required to solve sudokus?

This all started, like many things, on a whim. On a plane. By the time the flight was over, I had a basic solver that could complete all the "hard" boards from the plane's entertainment system. Back under the coverage of WiFi, I found Sudokus that my code could not solve: searching for "hardest sudoku" on Google, one typically lands on this Telegraph page for the infamous "Everest" board, from Arto Inkala.

The initial solver implements three strategies:

basic elimination (remove the value of an assigned cell from its peers' candidates)
sole candidate (if all peers in a group cover all but 1 number, you're that number)
naked twins (if two cells in a group share the same two candidate values, remove those values from peers)

but there are dozens of them for solving sudokus. See for example:

sudokuWiki strategy families
kristanix.com solving techniques
sudokuDragon.com strategies: basic / advanced

A single, generic elimination strategy (aka the Rule)

While adding the "hidden twins" strategy to the basic solver, I realized that the naked and hidden twins are instances of a more general rule (which we will call, for the rest of this piece, the Rule) that encompasses altogether hidden twins, naked twins, hidden/naked triples, quadruples and so on. Moreover, that basic elimination and sole candidate were also covered by that same rule when the subset of interest (a pair, a triple, etc.) is of size 1. Here's code implementing that single general strategy:

for subset_size, group in product(range(1, 9), groups):
    for subset in combinations(group, subset_size):
        candidates_in_subset = set(''.join(board[index] for index in subset))
        if len(candidates_in_subset) == len(subset):  # we found a constraint
            all_supersets = [g for g in groups if set(subset) <= set(g)]
            for cell in [cell for g in all_supersets for cell in g if cell not in subset]:
                board[cell] -= candidates_in_subset

In words:

for every subset of N cells in a row, column or square; if there are only N candidate values in the subset, you can remove those values from the cells' peers in any row, column or square that contains the subset.

The reasoning is similar to the "naked twins" rule: let's say we found two cells in a row that can only contain a 2 or a 4: we do not know which value goes where, but know that these two values cannot appear in any of the cells' peers in that row and for any other group both cells belong to.

basic elimination corresponds to a subset size of 1: we have an assigned cell which contains 1 candidate value that can be removed from all the other cells in the same row, column or square. sole candidate corresponds to a subset size of 8: there are 8 cells, covering up 8 values so the single cell left out must take the 9th one. naked twins corresponds to a subset of size 2; its dual hidden twins to a subset of size 7.

This strategy alone, in 7 lines of code, generalizes all the following:

basic elimination (subset size: 1)
sole candidate (subset size: 8)
unique candidate (subset size: 8)
only square (subset size: 8)
two out of three (subset size: 8)
sub-group exclusion (iterated elimination)
pointing pairs (common supersets)
pointing triples (common supersets)
naked twins (subset size: 2)
hidden twins (subset size: 7)
naked triplets (subset size: 3)
hidden triplets (subset size: 6)
naked quads (subset size: 4)
hidden quads (subset size: 5)
general permutation
naked chains
hidden chains

So, this is awesome. This single rule suffices to solve most sudokus rated "very hard", "super fiendish", and equivalent. But is it enough to solve any and every sudoku board?

Can the Rule solve every sudoku board?

Let's look again at the Everest board. If we keep applying the Rule until no further candidate can be eliminated, we are left with the following board state:

The small numbers in each unassigned cell are the remaining candidates (often called "pencil marks": the red ones are those participating in a conjugated pair, discussed later).

So, the answer is "no".

The Rule is not enough by itself. Since it operates on a single group at a time, and propagates information within groups only if they share cells, it fails to capture the group to group dependencies exploited by more advanced strategies. Were we to implement those too, would we be capable then of solving every sudoku?

The answer is again, sadly, "no".

One easy way to convince oneself is to open the board on SudokuWiki.com solver. There, you can click repeatedly on the "Take step" button: this applies a large collection of advanced strategies to the board, all of which still ultimately fail to pinpoint the single correct solution.

Is search/backtracking necessary?

Advanced strategies such as X-Wing, Swordfish, X-Y-Wing and extensions derive constraints from loops of conjugate pairs (pairs of cells that, for a candidate number, mutually exclude each other). In a loop with an odd number of cells, one of the two configurations leads to an inconsistent state and can be ruled out. For an even number of cells, no inconsistency can be directly detected but one can remove candidates that are eliminated in either configuration.

All these strategies consider two or more alternate allocations of values in a collection of linked cells, and compare the resulting state of the board in each one of the cases. They are, in essence, search and backtracking strategies in disguise, no more clever than brute forcing through all possible combinations.

Since all the strategies based on conjugated tuples are equivalent to search and backtracking, the latter is the second and final strategy required for solving any sudoku board. sudoku.py is a a self-contained solver based on these two ideas.

Peter Norvig's implementation justifies the introduction of search to avoid the tedious exercise of implementing dozens of rules, but it turns out adding search is a necessary evil if we are to solve any possible sudoku instance.

Ultimately the question "can every sudoku instance be solved logically?" is debatable. On one hand, there exist sudoku boards that are impervious to all currently known human-applicable strategies. On the other, the search and backtracking strategy, oftentimes deemed inelegant by practitioners, is an instance of proof by contradiction, which is an integral part of the logic arsenal.

Refer to Wikipedia to read more about the mathematics of sudoku and the availble solving algorithms.

r1cc4rdo / sudoku

A minimal Sudoku solver

How many strategies are required to solve sudokus?

A single, generic elimination strategy (aka the Rule)

Can the Rule solve every sudoku board?

Is search/backtracking necessary?

About

Languages