You’re a member of the French Resistance in the height of WWII. You’re part of a network of resistance members who have to work with other resistance members they’ve never met before. For instance, an agent from Paris might have to meet up with an agent in Normandy to work together on sabotage before the invasion. But resistance is not something the Nazis take lightly, and they’ve deployed double agents who attempt to infiltrate the resistance. The agent in Normandy needs some way to verify that this person from Paris is in fact a resistance member rather than an enemy double agent.
One solution would be the use of passwords. The resistance HQ distributes a list which contains the resistance agents’ code names and their passwords. If the Paris agent (let’s call her Alice) goes to meet the Normandy agent (let’s call him Bob), Bob can say “If you’re really Alice, tell me your password so I can verify it against my list.” If she knows the password, this demonstrates to Bob that Alice is really Alice, since the Nazi double agents won’t have been given passwords.
But this method is woefully insecure. If any resistance member gets captured, the Nazis have all the passwords and can can infiltrate the ranks of the resistance with impunity. The resistance needs a better authentication scheme, one that’s not so vulnerable to capture of an agent’s list. In short, the resistance needs a method of authenticating a password without knowing what the password is.
This can be done with a hash function. A hash function is designed in a way that’s downright perverse to aficionados of calculus and continuous functions – it’s designed so that its output varies wildly for inputs that vary only slightly. Let me give a contrived example of a hash function before describing how the resistance would use it.
Take an integer – one of at least several digits – and multiply it by itself twenty times. The result is going to be some really gargantuan number. Take the last 10 digits of that number. That’s the output of our function, which we’ll call h(n).
So if your password is 1234, the hash of your password is h(1234) = 9696306176. If your password is 1235, the hash of your password is h(1235) = 5244140625. It takes a little bit of math to get the hash, but it’s not intractable. Notice that the hash function is one-way. There’s no way to recover the password from the hash because you can’t invert the operation “throw away all but the last 10 digits”. A side effect of this is the fact that many different numbers could have the same hash, but in practice with 10^10 possible hashes a “collision” of two passwords having the same hash by accident is unlikely. (Although watch out for birthday attacks!)
The resistance can use this property of hash functions to make their resistance network more secure. Instead of distributing a list of all the agent’s passwords, the resistance can distribute a list of the hashes of their passwords. Thus if Bob knows that Alice’s hash is 7001140801, Alice can verify her identity by saying that her password is 314159, which has that as its hash. But if a Nazi double agent (let’s call her Eve) has managed to steal the list of hashes, she still can’t impersonate Alice. Eve doesn’t know what password to use to generate that hash. She could try thousands or millions of guesses and hope that eventually she found one with a hash that matches Alice’s hash, but with all the possible hashes that would be a herculean task.
I don’t know if any resistance movements in the pre-computer era ever actually did anything like this. But it’s an illustrative example of one of the fundamental concepts of modern computer security. When you select a password for a computer system, the computer doesn’t store your password. It stores a hash of your password. When you input your password later on, the computer hashes your input and checks it against the stored hash. If for some reason the system is compromised in some other way, the attacker can’t steal your password because it was never on the computer in the first place.
Now the hash functions used by modern computers are worlds more sophisticated than the toy example I gave above. They also give much more output – while my example had a 10-digit output, the outputs of modern hashes are often hundreds of digits long. This means an attacker has to search a much larger space of possibilities to find an input that gives a pre-selected hash.
The details of hashes and how they might be attacked quickly become pretty complicated, but are usually pretty interesting. It’s an example of how some quite esoteric number theory can have a tremendously practical application. You can play around with hashes here if you’d like to see real hashes in action. Wikipedia as usual has a good writeup which is an excellent place to start if you’re interested in learning more.