Now on ScienceBlogs: Open Lab PSA

Seed Media Group

Good Math, Bad Math

Finding the fun in good math; Shredding bad math and squashing the crackpots who espouse it.

Search

Profile

markcc.jpg
Mark Chu-Carroll (aka MarkCC) is a PhD Computer Scientist, who works for Google as a Software Engineer. My professional interests center on programming languages and tools, and how to improve the languages and tools that are used for building complex software systems.

Donors Choose

Other Information

Add this blog to my Technorati Favorites!

Recent Posts

Recent Comments

Categories

Blogroll

Old Topic Indices

Great Online Books

« Friday Random Ten, October 3 | Main | Nobel Prize Blogging: Symmetry Breaking »

Cryptographic Integrity using Message Authentication Codes

Posted on: October 6, 2008 9:39 AM, by Mark C. Chu-Carroll

I don't have a lot of time to write; I'm having my fifth (I think) upper endoscopy done tomorrow, which means that the day's going to be a wash; and Yom Kippur is thursday, and I need to cook, so between the personal crap and work, I'm not going to have much time for blogging. So I'm trying to make use of the time I have to write one short but (hopefully) interesting post.

One thing that I've mentioned in passing is the distinction between message confidentiality, and message integrity.

Confidentiality is most of what we've been talking about so far. Confidentially provides a guarantee that when you send an encrypted message, no one but your intended recipient is able to read the plaintext.

Integrity is something very different. Integrity guarantees that if you send an encrypted message, there's no way that the encrypted message could have been tampered with after you encrypted it, without the recipient knowing it.

Suppose you're using CBC mode for a message in DES, and you've got an attacker who wants to screw up your message. What happens if the attacker flips a couple of bits in one block? First, the plaintext for that block will be corrupted. Second, the ciphertext for that block is used for decrypting the next block - so the plaintext for the next block will also be corrupted.

Can you tell that there was a corruption in the message? If the plaintext was something like human language, you can see that it makes no sense. But that's relying on our meta-knowledge about the structure/nature of the plaintext. In fact, just looking at the encrypted message, there is no possible way for us to detect tampering!

In CBC mode - and in fact in all of the modes we've looked at, an attacker can change bits of the ciphertext, corrupting the message, and we can't tell. That's why we say that these modes of operation can't provide any guarantees of message integrity.

How can we get around that? The easiest thing is to add something called a message authentication code (MAC). A MAC is basically a hash-code: a short string appended to the message which in some way summarizes the message, so that if any part of the message was changed, the MAC will not match the message, and so we'll know that the message was corrupted.

It's easiest to understand the MAC using a bit of notation: Mac = M(T, K), where "M" is the MAC function, T is the plaintext of the message, and K is an encryption key. So you take your plaintext and your key, feed it to M, and get back a short string which is the MAC for your message.

MAC functions need to have some basic properties to be successful in protecting integrity:

  1. Changing a single bit of the input message must produce a significant change in the MAC.
  2. Given a MAC value V, a MAC function M, and a key K, it must be difficult to compute a plaintext T where M(T,K)=V.
  3. Given an oracle O(T) which computes M(T,K), and the ability to compute O(T) for an arbitrary set of chosen plaintexts, it must be difficult to guess the MAC for a particular message without submitting it to O. (That is, even in the scenario of a ideal chosen plaintext attack, where you can choose any set of messages you want, and get the MAC for those messages, that won't help you to guess the MAC of any message outside of that set.)
  4. Given a message T, it must be difficult to find a message U such that M(T,K)=M(U,K).

If you're familiar with digital signatures, some of this should be ringing bells. The idea of a digital signature is very close to this. The main difference is that a digital signature is asymmetric: the sender and the receiver have different keys. The sender has a private key, and signs the message; everyone else has a public key, and can verify that the sender signed the message. With MACs, both the sender and the receiver have the same key. MACs provide no guarantee of who originated the message; they just guarantee that the message wasn't changed between the time it was encrypted by the sender and decrypted by the reciever.

Next post, I'll go through one or two MAC algorithms, and then talk about modes of operation that incorporate MAC-like integrity guarantees.

Share this: Stumbleupon Reddit Email + More

Comments

1

I'm familiar with MACs from various applications, but I'd never really considered their use protecting the authenticity of a symmetrically-encrypted message. One possible simplification occurs to me, though: Couldn't you simply append a secure hash of the preceding plaintext to each message? Unless the attacker can predict the effect of his changes (which shouldn't be possible), he shouldn't be able to modify the message _and_ the hash in such a way that it remains undetectable, right?

Posted by: Nick Johnson | October 6, 2008 10:21 AM

2

Yes, I believe that an encrypted message's integrity may be protected by a simple hash appended to the plaintext. However, unlike a MAC, transmitting the hash in the clear could expose known plaintexts.

One of the more interesting applications of MACs I've seen was in a instant messaging protocol---use of a symmetric key, instead of asymmetric signatures lets each party know they're getting authentic messages from the other. Unlike signatures, however, neither can prove to a third party that the other said something, since both have all the information they need to spoof messages from each other.

Posted by: MPL | October 6, 2008 12:14 PM

3

Actually, I was meaning to say that you could append the hash to the plaintext _before_ encryption.

Posted by: Nick Johnson | October 6, 2008 12:40 PM

4

Mark, this is incorrect:

Second, the ciphertext for that block is used for decrypting the next block - so the plaintext for the next block will also be corrupted.

In fact, the ciphertext for the next block will only have a single bit flipped in it, the bit that the attacker flipped. This is because block n+1 only depends on the ciphertext of block n, not its plaintext.

Nick, your idea is insecure because of the above. If you do:

CBC-Encrypt(msg || HASH(msg))

Then the last block can easily be updated to match the hash of any changes the attacker makes to the encrypted message blocks. The attacker can merely set block n - 1 to:

Ciphertext[n - 1] xor HASH(origMsg) xor HASH(attackerMsg)

That's the whole reason for using a MAC.

Posted by: Nate | October 6, 2008 9:35 PM

5

One more comment to Nick. You may be tempted to say that an attacker can't both change the msg and the hash at the same time. If you're using counter mode or RC4 (stream cipher constructions), every bit is independent. For schemes like CBC where there is some redundancy, there are a number of cases where the propagation behavior of CBC does not prevent an attack.

If the data to be changed is in the same block as the hash, it can easily be changed at the same time. Depending on the protocol, the corrupted block can be surrounded by type fields that cause the receiver to ignore the result of the hash (say, if the receiver crashes once it hits the corrupted block). The receiver could allow retransmissions, allowing the attacker to keep trying until the modified message is accepted.

In summary, you don't want to depend on accidental behavior in your system to keep you secure. Without a standard public key signature or MAC, your ciphertext integrity is not protected.

Posted by: Nate | October 6, 2008 9:51 PM

6

Nate:

For the purposes of a quick and simple explanation, I think that "the following block will be corrupt" is correct. Yes, you can be more precise about exactly *what* the corruption is; but the fact is that you know that changing one encrypted block will result in two plaintext blocks being different than what the sender intended. Corruption is *any change* in the plaintext from what the sender intended.

Posted by: Mark C. Chu-Carroll | October 6, 2008 10:09 PM

7

Fifth upper endoscopy? That's a little hard to swallow.

Seriously - I hope you're well and things turn out okay.

Posted by: Ian | October 7, 2008 8:04 AM

8

Ian:

Yeah, it was my fifth. I had such severe reflux that I had to have surgery. I had three endos before the surgery - the GE I was seeing at the time just couldn't believe that someone my age (I was 28 when I started seeing him), and kept looking for some other cause. Then after the surgery, I had trouble, and then did an endo to look for clues as to the cause. Now, 12 years later, the doctor I saw for my sinus infection thought it might be reflux related.

The good news is, the endo showed *nothing* wrong. No reflux, the surgical fix from 12 years ago is intact, everything is good.

Posted by: Mark C. Chu-Carroll | October 7, 2008 8:12 AM

9

Off topic, and I know this is fish in a barrel, but you might have some fun picking at Conservapedia's assertion that "The odds of Obama being truthful in his claim that he converted to Christianity are less than 100 to 1 against it, as fewer than 1% of Muslims convert to Christianity." Bayes is spinning in his grave...

Posted by: Alejandro | October 7, 2008 12:25 PM

10

MACs use shared keys because their focus is -messages- -- that is, bits in motion. If you use something like SSL the protocol allows each side to negotiate a shared secret session key, and that session key is used with a symmetrical encryption like AES (or DES) for the rest of the session. That setup can take a significant amount of time, something that has is a real consideration for anyone doing low-level work with SSL connections. Also, either party can demand renegotiation of the session key at any time, and "good practice" requires it be done periodically. (Once every N minutes or M megabytes, etc.)

Anyway, with something like SSL the message is broken into smaller packets and each packet is MAC'd and encrypted with the session key. The receiver verifies the MAC and sends a "packet garbled, retransmit" message if it doesn't match.

If this sounds a lot like TCP/IP protocol, it should. They're independent implementations, though, and you can get into nasty error cases if there's a noisy connection since the TCP/IP and SSL packet sizes might not align. It gets even worse if you have deeply nested layers, e.g., you're talking to a HTTPS site over an encrypted VPN connection.

Posted by: Chris | October 7, 2008 7:07 PM

11

Mark, I would expect people to jump to conclusions based on "corrupt", interpreting that as "unpredictable, unknown changes to the entire block." This is especially the case since it is what happens to block N and you apply "corrupt" to block N+1 as well. I think that's misleading.

I would say it as:

Modifying the ciphertext of block N, even by flipping a single bit, causes block N to decrypt to an unpredictable plaintext block. "Unpredictable" means every bit is flipped with probability 50%. Block N+1 decrypts as normal, except each bit the attacker flips in block N results in a corresponding bit flip in plaintext block N+1. This gives an attacker complete control over the plaintext of block N+1, which emphasizes even more that CBC mode does not provide integrity protection.

Posted by: Nate | October 9, 2008 12:01 PM

12

@4

The attacker can merely set block n - 1 to:

Ciphertext[n - 1] xor HASH(origMsg) xor HASH(attackerMsg)

I'm thinking about it a few days already but still... How do you know HASH(origMsg)? It is, you know, encrypted, as is the whole plaintext.
I do remember reading articles from clever people that even if using the Authenticate-then-Encrypt scheme, the authentication needs to be "good", not just some hash — but I don't understand why.
Can you or MarkCC shed some light onto this? Perhaps we assume different knowledge for the attacker? He only knows the algorithm (cipher+hash used) and can change the message.

Posted by: milan_va | October 10, 2008 8:12 AM

13

By the way, Mark, I think you forgot to select a category for this post -- it's not listed under your "Encryption" topics.

Posted by: B-Con Author Profile Page | October 12, 2008 6:44 PM

14

I'll explain in a moment, but first want to say that if you have questions like this and are in charge of building security into a system, please seek assistance. My day job (link) is reviewing and designing security, especially involving the use of crypto. A security review by a third party should be part of every design project. When I design security for a customer, I usually contact someone else to get a formal review of it afterward.

Your proposal is totally insecure under a known plaintext attack. If the attacker knows the plaintext, they can easily calculate HASH(origMsg), the same as you. Then, they can update the value as I originally described, bypassing the integrity protection.

Even if the plaintext is not known, your proposal may be susceptible to this attack. Consider if the message is "transfer $1M to account #1234" and the attacker's goal is get the money but he doesn't know the original account number.

He first registers his own account, say #5678. Then, he intercepts the encrypted/hashed transfer message (above). Now, he goes through guesses for the original account number, generating HASH(origMsg) and the corresponding XOR value I described originally. He submits each modified transfer message to the bank and one is eventually accepted as valid.

A construction like HMAC intentionally includes a large secret value (not just a hash of potentially known data) to avoid this kind of attack. There are many others as well, depending on the application.

Posted by: Nate | October 15, 2008 3:59 PM

Post a Comment

(Email is required for authentication purposes only. On some blogs, comments are moderated for spam, so your comment may not appear immediately.)






Stats

ScienceBlogs

Search ScienceBlogs:

Go to:

Advertisement
Follow ScienceBlogs on Twitter
Visit the Collective Imagination blog
Advertisement

© 2006-2009 Seed Media Group LLC. ScienceBlogs is a registered trademark of Seed Media Group. All rights reserved.

Sites by Seed Media Group: Seed Media Group | ScienceBlogs | SEEDMAGAZINE.COM