So, last time, we looked at simple substitution ciphers. In a substitution
cipher, you take each letter, and pick a replacement for it. To encrypt a
message, you just substitute the replacement for each instance of each letter.
As I explained, it's typically pretty each to break that encryption - the basic
secret of the encryption is the substitution system, and it's pretty easy to
figure that out, because the underlying information being encrypted still has a
lot of structure.
There are a couple of easy improvements on a simple substitution cipher, some of which came up in the comments. For example, two
good easy improvements are:
- Instead of defining substitutions for single characters, define
substitutions for groups (pairs, triplets) of characters. This improves things,
because it allows you to work with groups that will reduce the visibility of
patterns. Still, because there's so much structure in human language, given
enough data, an encrypted message is still likely to be easy to decode. So this
is great for short messages, but not for anything bigger.
- Multiple substitutions: instead of always substituting, say, "x" for "a",
substitute each letter with a two-digit number. Then for common letters, allow
multiple possible substitutions. By assigning many codes to common letters, and
few codes to uncommon letters, you can make the coded symbols appear with
roughly equal frequency. This can seriously hamper frequency based analyses.
Both of those changes help. They work particularly well when combined. To do
a two-character version of that, you create a list of all possible two-character
sequences. Then you generate a frequency table for how often each two-character
sequence occurs in a large sample of the kind of text you're going to encode.
Then, finally, you assign a number of substitutions for each pair so that they
occur with approximately equal frequency. That gives you a pretty good
Still, it's not great. Given enough encoded text, it can be cracked with a
relatively small amount of computational power. If I know the basic idea of the
cipher, and I've got a decent amount of encoded text, I can write a program that
will figure it out pretty quickly. Plus, it's really a lot of work to generate
the cipher - you need to generate frequency tables, and work out the number of
substitions, etc. It's definitely not trivial to set up, and it's still pretty
easy to crack.
For that reason, those kinds of solutions aren't used much - there's a lot
of prep work, and the secret that you need to share with your partner is large
and complicated. You can get better quality with less effort and a
simple secret using a different scheme called a rotating cipher.
A rotating cipher is one of the simplest examples of a form of encryption
that CS geeks like me call stateful encryption. The idea is that you
start with a simple substitution. But then each time you encrypt a character,
you change the substitution. This covers an incredibly wide range of
encryption techniques - you can understand most digital encryption schemes as
being some kind of stateful cipher.
So let's look at the simplest kind of stateful cipher: a simple rotating
cipher. In a rotating cipher, you have an alphabet, consisting of the set of
characters that you can use in messages. You have a basic cipher, which consists
of the set of characters arranged in some order. You take the alphabet in order,
and line it up with the cipher. Each time you encode a character, you shift the
way that the cipher and the basic alphabet line up. The reason it's called a
rotating cipher is that the physical version of this consists of two wheels: the
inner wheel has the un-encrypted alphabet; the outer wheel has the cipher. To
encrypt a character, you look it up on the inner ring, and write down the
corresponding character on the outer ring. After that, you rotate the outer
The precise way that you rotate that outer ring varies depending on the
exact encryption technique being used. There are a variety of techniques that
you can use for rotating the cipher. Two simple examples:
- Rotate by a fixed number of steps each character. For example,
rotate the cipher wheel three steps after each character is encrypted.
- Rotate by a function of the unencrypted character. For example,
if you encrypt the character at position five in the alphabet, rotate
the cipher by five steps.
For example, here's some Python code which implements rotating ciphers. It's
implemented as an abstract superclass with two subclasses using two different
rotation methods. The first is a fixed rotator - it rotates by three characters
each time. The second is a variable rotation: it rotates by 3 times the position
of the encoded character in the alphabet.
class RotatingCipher(object): def __init__(self, alphabet, cipher): self._alphabet = alphabet self._cipher = cipher # to be overridden by subclasses for different rotation # schemes. def _RotateCipher(self, c): pass def CryptNextChar(self, c): """Encrypt a single character and rotate the cipher.""" # find position of "c" in the alphabet. index = self._alphabet.find(c) # encrypted character is that position plus the rotation, # modulo the size of the alphabet. crypt = self._cipher[(index + self._rot) % len(self._alphabet)] self._RotateCipher(c) return crypt def Encrypt(self, s): """Encrypt a message string.""" self._rot = 0 result = "" for c in s: result += self.CryptNextChar(c) return result def DecryptNextChar(self, c): """Decrypt a single character and rotate the cipher.""" # to decrypt of character, find index of the crypted character in # the cipher, and subtract the rotation from it, giving the # index of the decrypted character in the alphabet. index = self._cipher.find(c) index -= self._rot if index
The quality of a rotating cipher depends largely on quality of the
the rotation strategy. A fixed offset rotator frankly stinks - it's only marginally better than a simple substitution cipher. But with a really good, sophisticated rotator, you can make an amazingly good cipher. Up to the advent of modern computers, rotating ciphers were pretty much top of the heap of encryption - but with much, much more sophisticated rotators.
For example, the infamous German Enigma machines had three or four physical rotors. Each time a character was encrypted, one of the rotors moved, and its motion could trigger motions of the other two rotors, depending on the setting of an encryption key. The positions of the rotor wheels would create an electrical circuit which actually generated the encrypted character. In the most sophisticated version of Enigma, the rotors were replaceable - which allowed you to reconfigure the relationships between the rotors; you also could set an initial position on the rotor wheels (the passcode), and the more advanced machines included an electrical plugboard that could reconfigure the electrical
part of the machine. So the "rotation" corresponded to a rewiring of the machine by switching the positions of the rotors, which interacted in a complicated way which was dependent partly on the physical markings on the rotor disks, partly on the encryption key, and partly on the last encrypted character.
Interesting as always, Mark.
Have you read The Code Book by Simon Singh? It's more about history than algorithms, but perhaps more suited to casual reading on the subway than say Applied Cryptography...
You should probably point out two of the weaknesses of this cypher. The first is that the encryption machine (or code) constitutes a large percentage of the shared secret. One of the first steps to breaking Enigma was that the allies got their hands on one.
The second weakness comes from the amount of information about the key that is residual in the cypher text. Breaking a rotation cypher can be reduced to solving simultaneous equations. This is why the rotate by 3 is not effective. Each character serves to reduce the number of unconstrained variables. By using the properties of the encrypted text, it is possible to add a new unconstrained variable to the system for each one that is lost. A longer text no longer helps. However, using the same initial state (key) twice is deadly. The system begins to reduce quickly. This is particularly true if the messages are almost the same.
The nightmare scenario goes like this: Lt. Alice needs to send a message to Cmdr. Bob. Alice sets up the machine with the code for the day and begins here message:
ENEMY SIGHTED 2.4 KM WEST OF POINT TANGO AT 1534 ZULU.
PROCEEDING WITH ATTACK PLAN CHARLIE.
But the message is interrupted and Bob asks Alice to send it again. Alice resets her machine to the same key(?!) and sends the message again:
ENEMY SIGTHED 2.4KM WEST OF POINT TANGO AT 1534Z.
PROCEEDING WITH ATTACK PLAN CHARLIE.
Alice rushes through the message a second time and makes a few typos and abbreviations. Eve now has two messages that share 14 characters but diverge at character 15. She concludes that the messages use the same key and encode the most of the same letters. Suddenly, Each letter after the 15th reduces the number of possible keys in the search field resulting with both the message and the key being decoded.
Scenarios like this one mean that keys must be used exactly once and then recycled.
This type of code has good randomization -- it is very hard to predict how a given character will be encoded without the key, but has insufficient dispersion -- changing one character does not result in sufficiently unpredictable changes in the cypher text. I'm sure Mark is excited about fleshing out this idea in his next post.
As I pointed out in the comments last time, the Polish Cipher Bureau broke Enigma without getting their hands on one. All they had was a copy of the patent, a copy of a training manual (which had some plaintext/ciphertext pairs; very valuable stuff) and a bunch of intercepted traffic.
Pre-World War II, there were essentially two problems with Enigma that aided in breaking it. One was an arguable flaw in the design (which I'll deal with in a moment). The other was that in the lead-up to the war, the German millitary kept increasing the security of the machines in a piecemeal way. The Cipher Bureau broke each layer right before the system was upgraded. So they were playing catch-up, but they were winning.
OK, now the flaw in the design.
The rotor machines that the US used put an electrical signal in one end corresponding to the plaintext, put it through the rotors, and then the ciphertext came out the other end. To decrypt a message, you had to put the system in "decryption" mode, where the direction of the electrical signals were reversed.
Not so in Enigma. Instead, the signal went in one end, through the rotors, was reflected by a device called the "reflector ring", and then came back through the rotors. You can see the design in this diagram; the reflector ring is labelled with a "6".
This had an advantage, that encryption and decryption were completely symmetric. However, introduced two weaknesses. The first was that a letter could not "encrypt" to itself. The second was highly mathematical, and this is where it gets interesting.
A simple substitution is, mathematically speaking, a group action. You can invert substitutions, you can compose them, and you can make them "act" on an alphabet. Substitutions are isomorphic to the permutation group with the order same as the alphabet size.
Because of the way that Enigma was designed, the permutation performed by the reflector ring must be an involution with no fixed points. That gave it a very simple cycle structure.
Now, on the Enigma machine, there were multiple rotors, one of which "turned" on every symbol. This was known as the "fast wheel".
In every sufficiently long message, therefore, there were at least 25 contiguous symbols which were encrypted with everything except the fast wheel held static.
Think of system "before" the fast wheel as being some permutation A, the fast Wheel at time i as another permutation, Wi, and the reflector ring as another permutation R. Then the encryption action can be understood as:
A' Wi' R Wi A
where R must be an involution without any fixed points. (I'm using apostrophe to denote the group inverse operation.)
Now here's the cool bit: If you have enough plaintext and ciphertext, you can determine the cycle structure for Wi from the cycle structure of the encryption as a whole: It must be the same. Why? Because of a theorem which states that the permutation A' B A has the same cycle structure as B.
So knowing the cycle structure of R and the cycle structure of the encryption as a whole gives you the structure of the fast wheel. The rest is detail.
This theorem, by the way, is known amongst cryptanalysts as "the theorem that won World War II". It's not an exaggeration.
Your mention of Python code and "rotating cyphers" jumped me to a Monty Python TV show where they had "rotating knives" as part of the design of an apartment block - designed by a guy who "mainly designs slaughter houses".
I guess you had to be there. Or rather, not!
Good blog though!
I'm loving the encryption posts. Keep them coming!
I wrote a similar one that used a combination of Sin and Cos functions along with some boundary conditions to map it onto an ascii table. When you encrypted the text, you fed it a key - any floating point value you wanted, and it would start encrypting, but it would use the value of each encrypted letter as a key for encrypting the next character, including whitespace and punctuation.
To break the cypher, you would need to figure out what math function was used and what the initial key was. I would have liked to try it out on someone with know-how and computing power to see if they could crack it.
Two interesting posts. The one explaining a bit on the basic principle behind the enigma and then one explaining why the enigma was using a flawed algorithm due to the fixed rotations used.
I'm enjoying the cryptography posts as well. I happened upon an interesting site about the history of cryptography: Frode Weierud's CryptoCellar. He's even got a windows simulator of the Japanese PURPLE machine. I'm interested in cryptography in general, and the posts here help those of us without a master's in mathematics get a better grip on the state of things.
One other link that I found both quite cool and somewhat disturbing: For those of us with children who might express an interest in cryptography, check out CryptoKids - America's Future Codemakers and Codebreakers, brought to you by America's very own National Security Agency.