Hash Week! (Part 2)

By mspringer on October 9, 2012.

Yesterday we looked at hash functions. As you recall, they're functions which take an input and generate a random-seeming output. As a quick example, here's the output of the SHA-256 hash function for the name of the Scottish physicist James Maxwell and a misspelling thereof:

SHA256("James Clerk Maxwell") = 2667629603913530690117759428994407894024237387971995154086108064226397\
5353322

SHA256("James Clark Maxwell") = 9129664885155451589341762461551711693832872424126676652783015499131718\
4589063

A tiny change in the input generates a wildly different output, so tt looks like SHA256 is a pretty good hash function. For every input, it dumps out some 256-bit number that looks entirely random. For cryptographic purposes it's not enough that the digits look random, they need to satisfy three specific properties, which we'll go through one at a time.

1. Preimage resistance.

If I give you a hash value, you should not be able to find a message whose hash is that value. In other words, if I say SHA(x) = 1402163220222678497648226475128810495847235325536749812516677580084870\
9608774, you should not be able to come up with some x that works. Of course you could always just start hashing random strings and odds are after about 2^256 of them you'd hit a string that hashes to that value just by chance. But 2^256 is a gigantic number and in practice you'll never be able to do it.

Why care about preimage resistance? With digital signature algorithms, it is possible to make mathematical versions of statements like "The owner of this cryptographic key asserts that the message with hash [some value] did in fact originate with them." If you can generate a distinct message with the same hash, this authentication can be compromised.

2. Second-preimage resistance.

If I give you a message, and you compute its hash, you should not able to generate a different message with the same hash. This is a slightly more difficult test for a hash algorithm to pass. Here the attacker effectively has two pieces of information - the original message, and its hash. If a hash algorithm is weak, the attacker might be able to tweak the original message in such a way that it still has the same hash value. You'd hate for an attacker to be able to take "Operation Overlord to commence at midnight" and generate a message like "Operation Overlord to be cancelled" and cause the replacement to have the same hash by judicious arrangements of wording or typos.

This might seem somewhat academic. If the attacker has access to the original message, aren't you already in deep trouble? Not always. Sometimes the message is meant to be public, and the sender is using a digital signature algorithm to sign the hash and thus verify the authenticity of the message. Your online banking website, for instance, has a public cryptographic key whose validity is checked by some certificate authority, and the certificate authority uses the hash of that public key to validate its authenticity to your browser. If it were possible to generate a fake key that hashed to the right value, this authentication would be compromised.

3. Collision resistance.

This is the hard one. You should not be able to find any two messages with the same hash. You don't care what the messages are, and you don't care what the hashes are, you just care that you can find two messages that hash to the same value.

Unfortunately this is much easier in general. If I want to find someone who shares my birthday, April 8, odds are I'll have to ask about 365 people before I find a match. But if I just start asking people their birthdays and I'm willing to settle for any match between any two people, I only have to ask about 26 people before I have an even chance of finding a match. In general the number of samples I need to find a birthday match - or a hash collision - is proportional to the square root of the number of different possible birthdays - or hashes. So if I have a 128-bit hash, there's some 10^38 possible hashes, but I only have to hash some 2^64 = 10^19 strings before I find a collision. And while that's a big number, it's not inconceivable that a collision could be generated by brute force checking of 10^19 hashes. But it would be tough.

If I need more security than that, I can just use a bigger hash. If I use a 256-bit hash then I'd still have to check some 2^128 = 10^38 hashes before a collision. And that's a ridiculously huge number even for computers. Or I could use a 512-bit hash and I'd have an inconceivably intractable 10^77 hashes to calculate before I'm likely to find a collision.

That is, if my hash function is collision-resistant. One of the most common cryptographic hash functions is called MD5, and cryptographers have figured out a way to easily generate collisions with that function. Even though it's a 128-bit hash, it only takes a home computer a few seconds to generate a collision with the right algorithms.

While the ability to generate an a collision with possibly random-looking messages might not seem so bad, in practice a sufficiently clever attacker can use collisions to compromise security. For instance, the authors of the Flame malware that attacked Iran's computer systems were able to use an MD5 collision to generate a fake "This software came from Microsoft and is trustworthy" certificate.

With the widely used MD5 hash comprehensively broken, its replacement SHA-1 showing serious theoretical weaknesses, and the newer SHA-2 possibly vulnerable to similar attacks, NIST decided to put out a call for proposals for a new hash function whose design takes into account the great advances in cryptography over the last decade or two. Tomorrow we'll talk a about the just-announced winner of NIST's competition.

[The SHA-2 hash comes in several variants with different bit sizes. The SHA-256 hash at the beginning of this post is in the SHA-2 family of hashes. This awkward naming convention has been the subject of a considerable dust-up in the SHA-3 mailing list, as interested parties debate various alternatives to long designations like SHA-3-256.]

More like this

Hash Week! (Part 1)

Last week NIST anounced the winner of its Cryptographic Hash Function Competition. After five years of review and many rounds of discussion and elimination, the winner is a hash function called Keccak, and its developers deserve many congratulations. It's a shame hash functions aren't better known…

Sunday Function

You're a member of the French Resistance in the height of WWII. You're part of a network of resistance members who have to work with other resistance members they've never met before. For instance, an agent from Paris might have to meet up with an agent in Normandy to work together on sabotage…

Hash Week! (Part 3)

Over the last two days we've talked about hash functions and their uses in cryptography and elsewhere. Remember that an ideal hash function is basically what cryptographers call a random oracle - given an input, it produces a random number in some range. (In practice this range is always [0,2^(2^n…

How Not to Do Message Integrity, featuring CBC-MAC

In my last cryptography post, I wrote about using message authentication codes (MACs) as a way of guaranteeing message integrity. To review briefly, most ciphers are designed to provide message confidentiality - which means that no one but the sender and the intended receiver can see the plain-…

Advertisment

Donate

ScienceBlogs is where scientists communicate directly with the public. We are part of Science 2.0, a science education nonprofit operating under Section 501(c)(3) of the Internal Revenue Code. Please make a tax-deductible donation if you value independent science communication, collaboration, participation, and open access.

You can also shop using Amazon Smile and though you pay nothing more we get a tiny something.

Science 2.0

Science Codex

More by this author

Is Bitcoin Currently Experiencing a Selfish Miner Attack?

January 11, 2014

Probably not. All right, now that you know my conclusion, let's see how to get there with data. First, some background. Let me give very quick overview of Bitcoin in this context. (There are many comprehensive overviews elsewhere.) Bitcoin is an ongoing ledger of transactions of along the lines of…

How often does the sun emit 1 TeV photons?

November 27, 2013

I had an interesting question posed to me recently: how frequently does the sun emit photons with an energy greater than 1 TeV? All of you know about the experiments going on at the LHC, where particles are accelerated to an energy which is equivalent to an electron being accelerated through a…

Everything in Pi... maybe.

April 12, 2013

George Takei posted the following thing to Facebook recently: It got reposted by a bunch of people and provoked a tremendous amount of discussion (for a math topic, anyway), much of which was somewhere in the continuum between merely wrong and psychedelically incoherent. It's not a new subject - a…

Why are clouds white?

April 1, 2013

Why is the sky blue? It's a classic question - probably the classic question of the genre of explanatory popular physics. The famous short version of the answer is that Rayleigh scattering by air molecules affects short-waveength light more than long-wavelength light, and so blue light tends to get…

Light from a Hairbrush

March 15, 2013

Question from a reader: Pick up a comb, rub it with your hair and you have got some electric charge. Now shake it and you are generating an electromagnetic wave. Am I right? Yes indeed. So why don't we see light emitted when we brush our hair? Let's run some numbers. If you wiggle around an…