How Not to Encrypt a File – Courtesy of Microsoft

88 points by rakel_rakel 9 years ago · 62 comments

Reader

The author could spend less time bashing the original article and a little bit more explaining how to do things right.

This:

> Suggestion to use the encryption key as the IV

is a second sub-heading while the words "initialization vector" don't appear until much later. Initialization vector is pretty obvious, "IV" isn't.

Also the author spends time complaining that the original article misunderstands the use of initialization vector while providing no explanation of how it should be used.

After reading the post I haven't learned anything useful other than that the original article was bad.

Bartweiss 9 years ago

I... sort of have mixed feelings on this.
I agree that the article could do far more to explain what's good, both in content (talking about why these things are bad) and in style (defining all terms immediately).
But holy shit, the MSDN article is bad. It's so hideously bad that I think there's nontrivial social value in bashing it extensively to discourage people from writing docs like this without getting them sanity-checked.
In short, I think this article is largely useless to people reading guides and trying to avoid the pitfalls of the original source, but is aimed at people writing crypto guides who have no business doing so.
jlebar 9 years ago

Explaining to you how to do it the right way is not an obligation of anyone that says X article is wrong?
"This article on global warming could spend less time bashing governments for inaction and more time talking about how I can reduce my emissions."
"This bad restaurant review could spend less time bashing the chef's food and more time telling me where the good restaurants are."
Similarly maybe the author didn't explain what "IV" means because their audience understands that term.
"This article in CACM uses 'NVRAM' in the heading, while the words "non-volatile" don't appear until much later. Non-volatile is pretty obvious, 'NVRAM' isn't."
- Lagged2Death 9 years ago
  
  Explaining to you how to do it the right way is not an obligation of anyone that says X article is wrong?
  I wouldn't expect an explanation, I wouldn't say the author is obliged to that kind of effort.
  I did sort of expect a link to an explanation, though.
  At this writing, other comments in this very HN thread claim there are many intro-level explanations of IVs out there to choose from. They don't link to them either.
  Hypertext is what makes the web special, you know? The article would be more useful with a link. Think of this: even this discussion, here on HN, would have been more fruitful if the author had included a link to some explanation.
  Similarly maybe the author didn't explain what "IV" means because their audience understands that term.
  I have actually shipped a couple of products that made use of encryption packages, and I've never heard of an IV. Maybe the encryption advice I followed was terrible; maybe the instructions were terrible; maybe the packages were terrible. Maybe I'm an idiot suited only to the digging of ditches.
  - jlebar 9 years ago
    
    The blog post also would have been more useful if it had been a full crypto textbook rather than a single post.
    One can always say, "X would have been more useful if it had included Y (or at least a link to Y)". This sort of criticism is not useful, particularly of informal writing that someone posted to a personal blog.
    While we're talking about awesome things about the web, use Google or Wikipedia. You don't need to be spoonfed a link, so why are you asking to be?
- recursive 9 years ago
  
  I was in the audience, and if I ever knew how one should use an IV, I forgot. The article would have been more valuable to me if it gave a summary of what IVs are instead of what they aren't.
  - jlebar 9 years ago
    
    What makes you think were the author's intended audience, exactly? It sounds like the author intended their article for people who know what an IV is. You don't, and, while there's no shame in that, it does seem to indicate that you're not in the audience.
    This is like me complaining that I can't understand Terry Tao's blog when it's posted to HN. It's not written for me.
  - snakeanus 9 years ago
    
    You were probably not the intended audience then. It seems to me that this is intended for people who have some form of knowledge about cryptography already.
    In any case, there are many articles about IVs and how to use them already so I do not see the point of explaining what an IV is yet again in that article.

Sophira 9 years ago

While I'm sure the article is correct, it doesn't even attempt to link to resources to say how these things are misunderstandings. For example, I myself don't really understand IVs, and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret, or why the IV isn't required to be able to decrypt the file again.

Regardless, it's obvious that the fact that bad encryption advice in a MSDN article is horrifying.

fpgaminer 9 years ago
> I myself don't really understand IVs
Time to drop some knowledge!
IVs are used in a number of places in cryptography, so I'll just pick one (easy) example.
Consider the stream cipher ChaCha20. You can think of ChaCha20 as a black box. You input a key and an IV and out you get a really, really long stream of uniformly random bytes. (This is a simplification but sufficient here). ChaCha20 works in such a way that having any or all of the output stream doesn't help you figure out what the inputs were. It's irreversible. ChaCha20 is also deterministic; the same input will give the same output.
You can then use the output of random bytes to encrypt a message by XORing with your plaintext. To later decrypt, you feed the same key and IV, get the same stream, XOR the ciphertext with it, and by the property of XOR you'll get the plaintext.
Now why is there an IV? Let's consider a ChaCha without an IV. The system works like so:
```
    R = ChaCha(Key)
    Ciphertext = Message ^ R
```
So let's encrypt two different messages:
```
    R = ChaCha(Key)
    Ciphertext1 = Message1 ^ R
    R = ChaCha(Key)
    Ciphertext2 = Message2 ^ R
```
Notice how R is the same for both messages? Again, ChaCha is deterministic; the output is the same for the same inputs. Since the key is the same, R is the same. Now an attacker, knowing this, can do this
```
    Q = Ciphertext1 ^ Ciphertext2
```
What does Q end up being? Let's look:
```
    Ciphertext1 ^ Ciphertext2
    = Message1 ^ R ^ Message2 ^ R
    = Message1 ^ Message2
```
So Q ends up being equal to the XOR of the two messages. That's really bad. The xor of two messages might be enough to tell the attacker what the messages are, especially if the messages are predictable (like english text). But maybe that's not scary enough. Well there's another attack. What if you're encrypting a data format with a header. Headers often have the same data in the same places. So the attacker knows part of the message. Uh oh...
```
    R = Ciphertext1 ^ Message1
```
If the attacker knows the message (or any parts of it) they can recover the R of those parts. And now, since your key is always the same and your R is always the same, all the other messages you encrypt will have those bytes exposed.
This is where IVs come in:
```
    R = ChaCha(Key, IV)
```
IV should be unique per message. That means that every R is different! None of the above attackers work anymore. XORing two ciphertexts together returns gibberish:
```
    R1 = ChaCha(Key, IV1)
    Ciphertext1 = Message1 ^ R1
    R2 = ChaCha(Key, IV2)
    Ciphertext2 = Message2 ^ R2

    Ciphertext1 ^ Ciphertext2
    = Message1 ^ R1 ^ Message2 ^ R2
```
And if the attacker knows the message, all they can recover is R1 or R2 (or any R). But that's useless, because since all your IVs are unique that R will never be seen ever again.
That's the point of IVs.
> why the IV isn't required to be able to decrypt the file again.
It is required. Obviously you need all the inputs to ChaCha to get the byte stream again, to decrypt the message.
Now sometimes the IV is known from the protocol. So say you're using ChaCha to encrypt network traffic. You might set the IV equal to the packet number. So both sides already know the packet number.
But you always need the IV to decrypt.
> and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret,
Consider again ChaCha20 as a blackbox. Key+IV goes in, stream of bytes comes out. There's no way to reverse that without the key (and IV). Since the attacker doesn't know the Key, they can't reverse it. Knowing the IV doesn't help.
Another way to think about it is that, instead of accepting a 256-bit key and a 64-bit IV, it's really just a 320-bit key. Knowing 64-bits of a 320-bit key doesn't help break a cipher. The cipher is still 256-bits strong. So you can share the IV without affecting security.
BIG NOTE: It's important that an IV is always unique. If an IV is ever re-used, the above attacks become available again because R will be the same for two messages.
Hope that helps. This is only one way that IVs are used. In ChaCha20 it's called a nonce, because ChaCha20 is geared towards usage on network protocols where the above trick of using packet number is applied. For block ciphers there are various cipher modes that get used, and most of them need an IV. The purpose is always the same; to make this "session" of encryption unique.
There's another way to use IVs, and I think they re-affirm the concept of what an IV actually is. Let's say you have a cipher that only accepts a key! No IV (like AES). You still want to make your encryption sessions unique. A way to do that is this:
```
    TempKey = HMAC (IV, Key)
```
And then use TempKey. HMAC is a form of hash. In this case it lets us combine a Key and IV in an irreversible way, yielding a new key. TempKey will be the right size key for the cipher (say, 256-bits). What this is doing is giving us a unique key for every encryption session. And that's the heart of IVs. And in many ways, ChaCha20 is doing exactly that. It's hashing together Key and IV and using the output hash to generate a long stream of random data that can't be reversed back to the key+IV.
(and in case you're wonder, yes, you can use a cryptographically secure hash function alone to build a stream cipher like ChaCha. It'll just be _really_ _really_ slow, because hash functions are really, really slow compared to ChaCha.)
- teh_klev 9 years ago
  
  Thanks for spending the time explaining this.
  - fpgaminer 9 years ago
    
    Happy to.
    Is this something people find interesting? I was thinking of doing a small guide/tutorial/course where I teach these basics of cryptography, while building up to a working file encryption tool written completely from scratch. Probably such a thing exists already, but /shrugs these kinds of questions always seem to come up.
    
    UncleMeat 9 years ago
    
    This is risky. Unless you are a pro, it is generally not a great idea to publish a "how-to" for crypto because of the risk you might get it wrong in subtle ways that now propagate through the ecosystem.
    
    fpgaminer 9 years ago
    
    I know all the subtle things.
    That said, I wouldn't write the course for people intending to become cryptographers or cryptographic engineers. That would require a university grade program. It would be geared towards people who have a curiosity of the inner machinations of encryption. Ya know, like the people who come to Hacker News, read articles like this, and ask questions.
- UncleMeat 9 years ago
  
  Note that AES uses IVs in CBC mode. It is incorrect to say that AES does not use IVs.
  - fpgaminer 9 years ago
    
    CBC mode is not specific to AES.
    EDIT: To elaborate. AES does _not_ use IVs. It has no support for them whatsoever. It takes as input only a key and plaintext/ciphertext. This is in contrast to other block ciphers like Threefish which do.
    AES, in most applications, has to be used in constructions that require IVs. But that's distinctly different. IVs are bolted onto AES.
    "It is incorrect to say that AES does not use IVs." is patently false. It'd be like saying "It is incorrect to say that AES does not use tweaks."
jdcarter 9 years ago

In addition to fpgaminer's excellent explanation, I highly recommend the book "Cryptography Engineering: Design Principles and Practical Applications" by Niels Ferguson, Bruce Schneier, and Tadayoshi Kohno. It's an excellent overview of how to use crypto primitives and why to use them that way.
rakoo 9 years ago

I know you're not just looking for answers but a pointer to some better documentation, and I can't provide you with those, but:
> why IVs shouldn't be considered secret
The least is considered secret, the least can be leaked and cause problems.
> why the IV isn't required to be able to decrypt the file again
The IV is required to decrypt the file again. In the linked document's design the IV is actually the encryption key, which means it is known by the receiver, which is why it's not included. But that is just a special case that should never be reproduced.
sixothree 9 years ago

Agreed. Where is the pointer to the correct article to use when encrypting a file in C#.

pacaro 9 years ago

Note: All my information re: Microsoft is from no later than 2013.

This is indicative of a classic challenge in the industry.

To ship code that uses crypto at Microsoft you have to go through an auditing process. To ship code that uses novel crypto, or works directly with crypto primitives, you have to be reviewed by a specialist crypto review board — that contains security and crypto people from across the company, names that you might know (e.g. Niels Ferguson was there last time I needed a review. Hi Niels!)

Samples and documentation aren't held to the same standard.

nailer 9 years ago

Microsoft have already 404d the article: https://support.microsoft.com/en-us/help/307010

casparz 9 years ago

Luckily we have a snapshot: https://web.archive.org/web/20170327154501/https://support.m...
- bartread 9 years ago
  
  Also now dead - I just get a blank page apart from the header and footer.
  - casparz 9 years ago
    
    I got the same at first, seems some script removes it though. I see the content flashing by, i saved the content using wget and got the original page.
  - kalleboo 9 years ago
    
    If you disable JavaScript you can see the content, it seems like there's some script that replaces it on page load.
    
    nthcolumn 9 years ago
    
    wat? why? they have some sort of 'message will self-destruct javascript' in their page which is carried into the wayback machine?
  - jaclaz 9 years ago
    
    I still can access it just fine: https://web.archive.org/web/20170327154501/https://support.m...
    (I have javascript disabled anyway)
    However the article is:
    >Article ID: 307010 - Last Review: Nov 15, 2012 - Revision: 1
    >Applies to Microsoft Visual C# 2005, Microsoft Visual C# .NET 2003 Standard Edition, Microsoft Visual C# .NET 2002 Standard Edition
    So it seems like a bit out of date anyway, having been originally devised for 2002-2005 products.
nailer 9 years ago

Here's the original Microsoft article:
https://gist.github.com/mikemaccana/badf6c16f203e05c02b42f93...
(disabled JS in DevTools, caught it from archive.org before JS to wipe it kicked in)

unscaled 9 years ago

As someone in charge of reviewing all crypto code for a sizable chunk of my company, I've yet to see a single case of someone using encryption primitives correctly by naive developers. To tell the truth, I don't think I've ever seen a single example of IVs used correctly.

At the very best of times I get AES-CBC-HMAC-SHA1 (usually Encrypt-AND-MAC) with binary keys and secret static IV.

I'm still waiting for the developer that will botch AES-GCM with a random nonce so I can have first world problems, but we're not there yet.

I wanted to call Microsoft sneaky for pulling out this article, but considering basically every top-ranked "how do I encrypt with AES" question on StackOverflow is full of bad advice, I'm glad they at least did something.

jwilk 9 years ago

The article says that DES "can be brute forced in a single digit number of days by a modern computer".

  2**56 keys / 9 days ≈ 92.7 Gkeys/s

Can modern computers actually compute DES that fast?

danbruc 9 years ago

This benchmark [1] gives 196.2 GH/s for DES using 8 Nvidia GTX 1080 Ti and Hashcat 3.5. So while your average computer is probably not quite sufficient it is certainly in reach.
[1] https://gist.github.com/epixoip/ace60d09981be09544fdd3500505...
mikeash 9 years ago

Here's a project that did 1.4G/s on a single GPU five years ago:
https://www.reddit.com/r/crypto/comments/162ufx/research_pro...
Stick multiple modern GPUs in a machine and single digit days seems feasible.
CiPHPerCoder 9 years ago

Yes: http://www.h-online.com/security/features/A-death-blow-for-P...

natch 9 years ago

Another version of essentially the same article is still live here:

https://support.microsoft.com/en-us/help/301070/how-to-encry...

d--b 9 years ago

Yep, all over the place:

https://searchcode.com/?q=ASCIIEncoding.ASCII.GetBytes%28sKe...

EDIT: ok maybe not "all over the place", but it's been done.

Strategizer 9 years ago

The article author is complaining about an MSDN article not being updated. The content even says it applies to VS 2005 at its highest. That's a hint of how old it is. Is he going to get the print version and complain about that next. If programmers are using this without thought that is on them not the example code.

cesarb 9 years ago

Raymond Chen wrote some time ago about the variable quality of MS Knowledge Base articles: https://blogs.msdn.microsoft.com/oldnewthing/20060424-21/?p=...

BusinessInsider 9 years ago

That's pretty disturbing. Though to be fair, the article in question was written a while ago (since it targets .NET 2005), and to be less fair, MS doesn't really review their documentation very well, at all.

duke360 9 years ago

probably you are too youn, in the past when internet wasn't so ubiquitus, having a MSDN cd documentation was a live saver. the docs that today have serius content directly descend from that days, the res, as other already said, are just boilerplate autogenerated docs., which nobody maintains anymore because simply the technology is too fast. so probably this doc page abaut usage of DES is directly from 1990 or so... and in that days probably was good enough

TheSpecialist 9 years ago

It does seem useless to make the IV the same as the key. But is there a reason making the IV the same as the key is worse than using 0 as an IV?

Just asking.

norcimo5 9 years ago

To encrypt: tar cz foo | openssl aes-256-cbc -salt -out foo.enc

To decrypt: openssl aes-256-cbc -d -in foo.enc | tar xz

(foo can be a file or directory)

snakeanus 9 years ago

This does not contain a MAC though, does it? Also why CBC? Why not CTR/GCM instead? And why AES256 instead of Chacha20-Poly1305 or some other modern AEAD?
- norcimo5 9 years ago
  
  What are the advantages of GCM over CBC? And whats wrong with AES256?
  - snakeanus 9 years ago
    
    - GCM unlike CBC is an AEAD mode (has a MAC build-in)
    - CBC needs padding, which when misused can lead to padding oracle attacks
    - GCM allows for parallel encryption
    > And whats wrong with AES256?
    There are more modern, faster and better ciphers that are designed to not be vulnerable against many side-channel attacks that AES is difficult to protect against.
  - kaoD 9 years ago
    
    GCM has an integrity check built in, which is very useful in crypto.
    https://crypto.stackexchange.com/questions/2310/what-is-the-...
    https://crypto.stackexchange.com/questions/14747/gcm-vs-ctrh...
    https://security.stackexchange.com/questions/33569/why-do-yo...

snakeanus 9 years ago

I feel disgusted after reading this. I wonder how many people applied the advices given by the original article because they made the bad decision to trust the official documentation by MS.

bartread 9 years ago

Oh, come on: whatever Microsoft's faults might be they have a very long track record, stretching back decades, of providing overall high quality documentation for developers.
Yes, there are errors. Yes, sometimes there is deeply misguided advice. But, on the whole, MSDN and its ilk has helped me far more often than it's hurt me.
Key point: compared with much other vendor and OSS documentation, Microsoft are absolutely streets ahead.
- setq 9 years ago
  
  Most of the documentation is boilerplate. There's very little real content now and most of it is filler.
  - wfunction 9 years ago
    
    Here's an example of a new API they added recently (the first one I thought of): https://msdn.microsoft.com/en-us/library/windows/desktop/mt5...
    It says things like this:
    > Indicates that the data for the file should be obtained from a WIM file. On access, data is transparently extracted from the WIM file and provided to applications. If the file contents are modified, data is transparently decompressed and the file is restored to the same physical form it had if this API were not used.
    This is "boilerplate" and "very little content" to you? What are you thinking of?
    
    setq 9 years ago
    
    To be fair Win32 isn't terrible. Have you looked at the .Net docs?
    http://imgur.com/a/iK4uG
    
    wfunction 9 years ago
    
    I haven't to be honest, but at the same time I'm not sure what you expect to see as the summary here. Is there anything profound to say about two overloads that differ by an extra "millisecond" parameter?
    
    krallja 9 years ago
    
    You need to click on the overload of this method that you are interested in. This is just the top level "where do you want to go?" document.
    
    setq 9 years ago
    
    It's not. There's a content about 11 screens down, not that you'd find it easily.
    
    darklajid 9 years ago
    
    Again, GP just stated that these are local links, links to anchors on that very same page, explaining the overloads.
    Are there too many? For this class, maybe. It's at least potentially worth discussing it. But the way the navigation works isn't "just scroll until something seems to fit".
- yebyen 9 years ago
  
  The last time I really had to deal with a bad MSDN article was probably 2003 working on an ASP/VBScript application that used MSXML. I was in my last year of High School, working for a local bank doing things that were certainly above my pay grade, with extremely minimal support, in my glory days.
  I remember getting 90% of the way through writing my application in VBScript and finding a piece of documentation about some XSD thing that I really needed to do to complete the tool, and that lots of people were reporting my similar issue, the support reply basically said, "get f'ed," this function works in JScript implementation of MSXML but not in VBScript.
  Sorry! Hope you have hundreds of spare hours to learn a new language and port over your entire codebase, because we're not fixing it.
  Every time since that I can remember I have ever referred to MSDN, I have found one post with my question, asked in clear terms that I could reach from a google search... posted four years ago, with one or more replies that are almost always very obviously wrong, from MS Certified Partner(TM).
  Maybe some of their documentation is great! I have not had the fortune to encounter it.
  While many open source projects have great documentation, and many others do not, the difference tends to be that if your Open Source project has bad documentation, or features that just plain don't work, you are free to read the source code and fix it yourself!
- alistproducer2 9 years ago
  
  If I could down vote this twice I would. I can't tell you how much I've been on the phone with MS support to try and do basic things with their software but can't because a.) the only existing documentation is wrong or b.) there's no documentation. My company pays MS a lot of money to not be able to do basic things with its software.
- nthcolumn 9 years ago
  
  MSDN really? I never use it. I always end up someplace else. The information is there somewhere, maybe. I don't use other vendors much to know but MSDN sucks for me majorly. Not a microsoft fan generally though so maybe too much pain this past 30 years for an objective view.

wintorez 9 years ago

I always look at Microsoft in order to learn how not to do anything /s

giancarlostoro 9 years ago

>It’s a good thing the caesar shift isn’t available in their library or it would probably have ended up in this tutorial.

https://docs.python.org/2/library/codecs.html#python-specifi...

Python does rot13 :)

proaralyst 9 years ago

But that's in the codecs library, not a cryptography library.
Sean1708 9 years ago

To be fair that's not a tutorial on how to encrypt and decrypt a file, it's a reference on the possible encodings you can use for a string.

Settings

How Not to Encrypt a File – Courtesy of Microsoft

Keyboard Shortcuts