Built on Facts

Hash Week! (Part 3)

Over the last two days we’ve talked about hash functions and their uses in cryptography and elsewhere. Remember that an ideal hash function is basically what cryptographers call a random oracle – given an input, it produces a random number in some range. (In practice this range is always [0,2^(2^n)], with n usually equal to 5 or 6 for non-cryptographic hashes or n equal to 7, 8, or 9 for cryptographic hashes.) This random number is deterministic, in that the same input always produces the same output. But the output is otherwise unpredictable. Given an output, it should not be possible to find a corresponding input except by brute-force calculation of every possible input. (And the other conditions we discussed yesterday.)

So how does a hash function actually work? One possibility might be just to add up the numbers in the input, divide by some chosen constant, and output the remainder as the hash value. In ASCII code, the letters of my first name, added together, give

“Matthew” = 77 + 97 + 116 + 116 + 104 + 101 + 119 = 730

If I want the hash to output a value between 0 and 32, I can divide by 32 and take the remainder as my hash. 32 goes into 730 a total of 22 times, with 26 left over. So the hash of my name with this very simple system is 26.

Some of the non-cryptographic hashes like CRC-32 are actually almost this simple. They’re very useful for error-checking and the like, but they fail the “random oracle” criteria completely. It’s easy to engineer a string with any given hash output.

The winner of NIST’s competition to develop the new SHA-3 hash standard is an algorithm named Keccak. Internally it’s pretty complicated, but conceptually it’s simple. In cryptographic terms it follows the sponge construction, which describes the way that it “soaks up” the input and “squeezes out” the output. It works like this: Keccak breaks up the input into blocks of about a hundred bytes. It takes the first block and dumps it into the hash’s internal memory. It scrambles the internal memory around with what amounts to a complicated shuffle. Then it takes the next block and combines it with the internal memory (via an XOR operation) and shuffles the internal memory again. Then it takes the next block, combines it with the internal memory, and shuffles again. This repeats until the hash has processed the entire input. When that happens, the output of the hash is just the internal state at the end.

That glosses over a lot of subtitles and completely ignore the details of the shuffle. I’m not going to try to explain those subtitles or those details because I don’t understand them very well – I’m not at all a cryptographer, just an interested amateur. Still, those details are interesting and I encourage you to take a look at the Wikipedia article.