Is there a fixed point in the MD5 transformation, i.e. does there exist x such that md5(x) == x?
729k165 gold badges1.4k silver badges1.4k bronze badges
asked Oct 25, 2008 at 2:21
4
Since an MD5 sum is 128 bits long, any fixed point would necessarily also have to be 128 bits long. Assuming that the MD5 sum of any string is uniformly distributed over all possible sums, then the probability that any given 128-bit string is a fixed point is 1/2128.
Thus, the probability that no 128-bit string is a fixed point is (1 − 1/2128)2128, so the probability that there is a fixed point is 1 − (1 − 1/2128)2128.
Since the limit as n goes to infinity of (1 − 1/n)n is 1/e, and 2128 is most certainly a very large number, this probability is almost exactly 1 − 1/e ≈ 63.21%.
Of course, there is no randomness actually involved – either there is a fixed point or there isn't. But, we can be 63.21% confident that there is a fixed point. (Also, notice that this number does not depend on the size of the keyspace – if MD5 sums were 32 bits or 1024 bits, the answer would be the same, so long as it's larger than about 4 or 5 bits).
193k47 gold badges301 silver badges297 bronze badges
5 Comments
Can you actually make the assumption that the MD5 sum of any string is uniformly distributed over all possible sums?
Yes. Large numbers and modulous form a roughly random distribution. If they didn't, you would have constant collisions. The nature of md5 forces its output to be distributed randomly.
My brute force attempt found a 12 prefix and 12 suffix match.
prefix 12: 54db1011d76dc70a0a9df3ff3e0b390f -> 54db1011d76d137956603122ad86d762
suffix 12: df12c1434cec7850a7900ce027af4b78 -> b2f6053087022898fe920ce027af4b78
Blog post: https://plus.google.com/103541237243849171137/posts/SRxXrTMdrFN
5 Comments
Are you sure about this : prefix 12: 54db1011d76dc70a0a9df3ff3e0b390f -> 54db1011d76d137956603122ad86d762 I used md5sum linux command, I got different result
Not sure you using the md5sum correct then. You can also confirm it online here: onlinemd5.com
Since the hash is irreversible, this would be very hard to figure out. The only way to solve this, would be to calculate the hash on every possible output of the hash, and see if you came up with a match.
To elaborate, there are 16 bytes in an MD5 hash. That means there are 2^(16*8) = 3.4 * 10 ^ 38 combinations. If it took 1 millisecond to compute a hash on a 16 byte value, it would take 10790283070806014188970529154.99 years to calculate all those hashes.
5 Comments
True, if you had to try every one. But you would only have to try every possible input to verify that there was no fixed point. If there is a fixed point (and Adam Rosenfield's answer suggests that there might be) then one lucky guess is all that's needed.
The function is irreversible in the sense that it has no mathematical inverse, however this only means that for a given output there may be more than one input. In general, the space of inputs for a given output would be infinite, but if you know it started as a 128-bit value you can narrow down the possibilities. There is a chance for "working backwards" if you don't treat the function as a black box, but instead read the spec and apply some mathematical thinking.
@Naaff: "only have to try every possible input" - and this is easier than to try every hash, how? Quite the opposite, since several possible inputs may hash into the same output.
@Piskvor: You misunderstood what Naaff meant (it took me a minute, too). A clearer way to say it would be "Only if there is no fixed point will you have try try every possible input (from the space 2^128)". In other words, you only have to try every possibility if none before that works. So 1.08e28 years, or one lucky guess!
While I don't have a yes/no answer, my guess is "yes" and furthermore that there are maybe 2^32 such fixed points (for the bit-string interpretation, not the character-string intepretation). I'm actively working on this because it seems like an awesome, concise puzzle that will require a lot of creativity (if you don't settle for brute force search right away).
My approach is the following: treat it as a math problem. We have 128 boolean variables, and 128 equations describing the outputs in terms of the inputs (which are supposed to match). By plugging in all of the constants from the tables in the algorithm and the padding bits, my hope is that the equations can be greatly simplified to yield an algorithm optimized to the 128-bit input case. These simplified equations can then be programmed in some nice language for efficient search, or treated abstractly again, assigning single bits at a time, watching out for contraditions. You only need to see a few bits of the output to know that it is not matching the input!
Probably, but finding it would take longer than we have or would involve compromising MD5.
3 Comments
It hasn't been broken. All they've been able to do is, in a reasonable amount of time produce 2 strings that equate the the same hash. It's still very hard to produce a string which will equate to a specific hash.
not sure how finding one would compromise md5, anymore than it would compromise the algorithm if I told you MD5("The quick brown fox jumps over the lazy dog") = 9e107d9d372bb6826bd81d3542a419d6
There are two interpretations, and if one is allowed to pick either, the probability of finding a fixed point increases to 81.5%.
- Interpretation 1: does the MD5 of a MD5 output in binary match its input?
- Interpretation 2: does the MD5 of a MD5 output in hex match its input?
34.3k11 gold badges80 silver badges131 bronze badges
5 Comments
There's nothing about the MD5 algorithm that implies hex - it operates on bytes, and produces bytes - so I think the latter interpretation is invalid.
Whether or not there is a fixed point under interpretation 1, there could still be (or not be) one under interpretation 2. However, if you are interested in exploring the problem, interpretation 1 seems like a much better place to start because you won't have to make all sorts of arbitrary decisions about casing and character encoding. On top of that, the binary case has fewer bits!
You are misinterpreting what the hex really is. You can represent binary in hex, just as you can represent it in decimal or octal or base 3. It's a number, and has different representations. So, interpretation 1 and 2 are the same thing. What you're thinking of is the character string representation, which is not the same hex at all but is an entirely different binary value. In fact you could have many different hex strings in different character sets. The 128-bit hash value can be represented as a "hex" string, but it is not equal to the string. The string is not the same binary data.
There is a huge problem with that idea though, in that it is directly dependent on your character encoding. Different encoding schema are going to result in entirely different result sets. There's even an entire project and an article debunking it based on this misunderstanding of how MD5 operates acodingfool.typepad.com/blog/2009/05/the-kembler-identity.html
Strictly speaking, since the input of MD5 is 512 bits long and the output is 128 bits, I would say that's impossible by definition.
6 Comments
The input can be any size. If the input is less than 512 bytes then it is padded, but small inputs are still acceptable. From Wikipedia: "MD5 processes a variable-length message into a fixed-length output of 128 bits. The input message is broken up into chunks of 512-bit blocks (sixteen 32-bit little endian integers); the message is padded so that its length is divisible by 512."
So you're assuming that, say, 0000000001 = 1 ? I would argue then that the question is poorly specified, at best.
The input to MD5 can be 128 bits. If MD5 wants to pad that input then, well, frankly, that's MD5's business. The input is still well defined. Likewise, the output is a well-defined 128 bits. If the (well-defined) input and the (well-defined) output are both the same then MD5(x)=x.











