Cryptography 101

By jfiore on July 25, 2007.

This week's NOVA Science NOW on PBS has an interesting piece on the Kryptos sculpture in front of CIA headquarters. The segment does a decent job of showing some of the basic techniques used such as substitution and transposition, in just a few minutes.

I am not a cryptographer but it is an area I have studied a little. It's a great topic to introduce to my first and second year programming students. Some of them really perk up when we start talking about it. Invariably, someone will ask if I can show them how to "crack" protected software. I always tell them that, although I have the knowledge, it would not be ethical. Some of them give me strange looks at this point.

I usually introduce a basic substitution scheme first, pretty much what the NOVA piece explains. I tend to approach this from the ASCII code angle rather than from the idea of arbitrary abstract symbols in order to get into some code fairly quick. So, we'll enter a string to be encoded and then add one to each character. Thus, an "a" turns into a "b", a "b" turns into a "c" and so on. For example, suppose we start with the string "cat". A simple +1 substitution would yield "dbu" (i.e., add one to the numeric value of each letter to get the replacement letter). We then move on to any arbitrary shift. This is all well and good but it is clearly limited. From here I like to introduce the concept of a look-up table to make arbitrary swaps instead of a simple offset. This also offers the opportunity to examine the efficiency of using a look-up table instead of the neophyte's "very, very long set of if-else clauses", both in terms of code size and execution speed. In the simple look-up table, to get the substitution for "c", you'd go to the third item in the table because "c" is the third letter in your alphabet. That table entry could be any letter in your alphabet. The only thing to remember is that the mapping must be distinct one to one. That is, two different characters cannot map to the same encoded symbol because when you went to decode it, you wouldn't know which of the two it came from. Thus, each source character will result in a single corresponding and distinct encoded character. The look-up table is the same size as the original alphabet, it's just that its ordering is jumbled.

One of the weaknesses of a simple substitution is that it won't hide symbol frequency; that is, the tendency of some symbols to be used more than others. Anyone who has ever played Scrabble knows that there's a good reason why "j" and "q" have higher point values than "e" or "a". I usually end the section showing how you can create a variable substitution to help hide this. For example, the position of the letter can be used as the index into a table of offsets. Note that the table doesn't include the replacement symbols, just the numeric value used to compute the new symbol. This idea is a little more challenging. Using the "cat" example, to encode the "c", we go to the third entry in our table. There, we do not find a replacement symbol, but rather, a number. We will use this number as an offset. So, if that number is 4, our "c" turns into "g" (four characters past "c"). Note that the table entries may be randomly placed; they do not have to follow some sort of pattern. When we decode, that "g" will be in the third position, so the table tells us the original offset was four. "g" minus four yields the original "c". This technique helps to hide the symbol frequency problem because you no longer have a one to one correspondence between original and encoded symbols. Rather, the encoding is a function of both the original character and its position in the string.

It should be noted that if all the numbers in this look-up table were the same, you'd have a simple substitution cipher as explained originally. I also like this technique as a way of showing the usefulness of modulo math. Without it, either your table would have to be as big as your message or you'd have to do some stupid code tricks to recompute an effective index into the table once you went beyond the table size. BTW, there is nothing that says the table must be the same size as your alphabet. It could be larger or smaller.

FYI, my first year students learn Python and in their second year they learn C (not traditional C, but something that is more geared toward embedded controllers).

More like this

Rotating Ciphers

So, last time, we looked at simple substitution ciphers. In a substitution cipher, you take each letter, and pick a replacement for it. To encrypt a message, you just substitute the replacement for each instance of each letter. As I explained, it's typically pretty each to break that encryption -…

Worlds Greatest Pathological Language: TECO

I've got a real treat for you pathological programming fans! Today, we're going to take a quick look at the worlds most *useful* pathological programming language: TECO. TECO is one of the most influential pieces of software ever written. If, by chance, you've ever heard of a little editor called "…

Simple Encryption: Introduction and Substitution Ciphers

The starting point talking about encryption is to understand what the point of it is; what it's supposed to do, what problems it's supposed to avoid. Encryption is fundamentally about communication: you've got two parties who want to communicate, but don't want anyone else to be able to listen in…

Transposition Ciphers

The second major family of encryption techniques is called transposition ciphers. I find transposition ciphers to be rather dull; in their pure form, they're very simple, and not very difficult to crack, even without computers. But some of the most sophisticated modern ciphers can be looked at as…

I usually introduce a basic substitution scheme first, pretty much what the NOVA piece explains. I tend to approach this from the ASCII code angle rather than from the idea of arbitrary abstract symbols in order to get into some code fairly quick.

What's funny there is that ASCII is itself a substitution cipher mapping letters, numbers, etc. to hex and, ultimately, binary.

I've actually made my own stego program, burying data into the LSBs of graphics. It's fun to play around with stuff like that.

fjdsjpop, dywsw. fjdisj ff, tFBMPO kfnu fjfjojk!

I used to teach Assembly and Computer architecture and I always gave a morals and "classyness" lecture after studying executable formats. You can write nice viruses and such when you understand what these files look like and what is done with them. I tried to give examples of people with class vs idiots and that the "community" of programmers who were professional would look down on you if you did stupid things like viruses. Also pointed out you didn't have to be really smart to do this stuff.

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Bill Bruford The Autobiography

May 18, 2009

What do you expect when you pick up an autobiography of a rock musician? Sex? Drugs? Rock-n-roll exploits with a chainsaw and a gallon of baby oil at the Ramada? Scandalous stories of band-mates and sundry hangers-on? You get virtually none of that in Bill Bruford The Autobiography. It's much…

Happy Birthday Bill Bruford

May 17, 2009

Master of the drum kit and poly-rhythmist Bill Bruford celebrates 60 years today. Well known among all manner of percussionists and drummers, Bruford's work spans 40 years from his early days with Yes, his tenure with several incarnations of King Crimson, and his own band, Earthworks, along with…

Is Total Cholesterol Misleading?

April 7, 2009

Under 200. That's the usual target for total cholesterol as reported in popular media. But are all 200s the same? I just received my profile from a recent blood test. Here's what it said. Total cholesterol: 204 LDL (bad cholesterol): 131.6 HDL (good cholesterol): 57 Triglycerides: 77 The total is…

Defar Breaks 5000m Indoor Record

February 19, 2009

Meseret Defar of Ethiopia ran 14:24.37 in Stockholm to break the women's 5000 meter indoor mark, lowering it by over 3 seconds. It is worth noting that this formidable run was performed on a track well short of the 200 meter indoor "standard" found in many colleges and universities, and thus…

Louie Bellson, RIP

February 16, 2009

Sad news for the drummers and jazz lovers on SciBlogs. Jazz drumming legend Louie Bellson passed away unexpectedly on Valentine's Day. Some details here. Update: Here's a short bio video with some nice bits of Louie playing and some rather unique kit layouts.