How Not to Encrypt a File – Courtesy of Microsoft
medium.comThe author could spend less time bashing the original article and a little bit more explaining how to do things right.
This:
> Suggestion to use the encryption key as the IV
is a second sub-heading while the words "initialization vector" don't appear until much later. Initialization vector is pretty obvious, "IV" isn't.
Also the author spends time complaining that the original article misunderstands the use of initialization vector while providing no explanation of how it should be used.
After reading the post I haven't learned anything useful other than that the original article was bad.
I... sort of have mixed feelings on this.
I agree that the article could do far more to explain what's good, both in content (talking about why these things are bad) and in style (defining all terms immediately).
But holy shit, the MSDN article is bad. It's so hideously bad that I think there's nontrivial social value in bashing it extensively to discourage people from writing docs like this without getting them sanity-checked.
In short, I think this article is largely useless to people reading guides and trying to avoid the pitfalls of the original source, but is aimed at people writing crypto guides who have no business doing so.
Explaining to you how to do it the right way is not an obligation of anyone that says X article is wrong?
"This article on global warming could spend less time bashing governments for inaction and more time talking about how I can reduce my emissions."
"This bad restaurant review could spend less time bashing the chef's food and more time telling me where the good restaurants are."
Similarly maybe the author didn't explain what "IV" means because their audience understands that term.
"This article in CACM uses 'NVRAM' in the heading, while the words "non-volatile" don't appear until much later. Non-volatile is pretty obvious, 'NVRAM' isn't."
Explaining to you how to do it the right way is not an obligation of anyone that says X article is wrong?
I wouldn't expect an explanation, I wouldn't say the author is obliged to that kind of effort.
I did sort of expect a link to an explanation, though.
At this writing, other comments in this very HN thread claim there are many intro-level explanations of IVs out there to choose from. They don't link to them either.
Hypertext is what makes the web special, you know? The article would be more useful with a link. Think of this: even this discussion, here on HN, would have been more fruitful if the author had included a link to some explanation.
Similarly maybe the author didn't explain what "IV" means because their audience understands that term.
I have actually shipped a couple of products that made use of encryption packages, and I've never heard of an IV. Maybe the encryption advice I followed was terrible; maybe the instructions were terrible; maybe the packages were terrible. Maybe I'm an idiot suited only to the digging of ditches.
The blog post also would have been more useful if it had been a full crypto textbook rather than a single post.
One can always say, "X would have been more useful if it had included Y (or at least a link to Y)". This sort of criticism is not useful, particularly of informal writing that someone posted to a personal blog.
While we're talking about awesome things about the web, use Google or Wikipedia. You don't need to be spoonfed a link, so why are you asking to be?
I was in the audience, and if I ever knew how one should use an IV, I forgot. The article would have been more valuable to me if it gave a summary of what IVs are instead of what they aren't.
What makes you think were the author's intended audience, exactly? It sounds like the author intended their article for people who know what an IV is. You don't, and, while there's no shame in that, it does seem to indicate that you're not in the audience.
This is like me complaining that I can't understand Terry Tao's blog when it's posted to HN. It's not written for me.
You were probably not the intended audience then. It seems to me that this is intended for people who have some form of knowledge about cryptography already.
In any case, there are many articles about IVs and how to use them already so I do not see the point of explaining what an IV is yet again in that article.
While I'm sure the article is correct, it doesn't even attempt to link to resources to say how these things are misunderstandings. For example, I myself don't really understand IVs, and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret, or why the IV isn't required to be able to decrypt the file again.
Regardless, it's obvious that the fact that bad encryption advice in a MSDN article is horrifying.
> I myself don't really understand IVs
Time to drop some knowledge!
IVs are used in a number of places in cryptography, so I'll just pick one (easy) example.
Consider the stream cipher ChaCha20. You can think of ChaCha20 as a black box. You input a key and an IV and out you get a really, really long stream of uniformly random bytes. (This is a simplification but sufficient here). ChaCha20 works in such a way that having any or all of the output stream doesn't help you figure out what the inputs were. It's irreversible. ChaCha20 is also deterministic; the same input will give the same output.
You can then use the output of random bytes to encrypt a message by XORing with your plaintext. To later decrypt, you feed the same key and IV, get the same stream, XOR the ciphertext with it, and by the property of XOR you'll get the plaintext.
Now why is there an IV? Let's consider a ChaCha without an IV. The system works like so:
So let's encrypt two different messages:R = ChaCha(Key) Ciphertext = Message ^ R
Notice how R is the same for both messages? Again, ChaCha is deterministic; the output is the same for the same inputs. Since the key is the same, R is the same. Now an attacker, knowing this, can do thisR = ChaCha(Key) Ciphertext1 = Message1 ^ R R = ChaCha(Key) Ciphertext2 = Message2 ^ R
What does Q end up being? Let's look:Q = Ciphertext1 ^ Ciphertext2
So Q ends up being equal to the XOR of the two messages. That's really bad. The xor of two messages might be enough to tell the attacker what the messages are, especially if the messages are predictable (like english text). But maybe that's not scary enough. Well there's another attack. What if you're encrypting a data format with a header. Headers often have the same data in the same places. So the attacker knows part of the message. Uh oh...Ciphertext1 ^ Ciphertext2 = Message1 ^ R ^ Message2 ^ R = Message1 ^ Message2
If the attacker knows the message (or any parts of it) they can recover the R of those parts. And now, since your key is always the same and your R is always the same, all the other messages you encrypt will have those bytes exposed.R = Ciphertext1 ^ Message1This is where IVs come in:
IV should be unique per message. That means that every R is different! None of the above attackers work anymore. XORing two ciphertexts together returns gibberish:R = ChaCha(Key, IV)
And if the attacker knows the message, all they can recover is R1 or R2 (or any R). But that's useless, because since all your IVs are unique that R will never be seen ever again.R1 = ChaCha(Key, IV1) Ciphertext1 = Message1 ^ R1 R2 = ChaCha(Key, IV2) Ciphertext2 = Message2 ^ R2 Ciphertext1 ^ Ciphertext2 = Message1 ^ R1 ^ Message2 ^ R2That's the point of IVs.
> why the IV isn't required to be able to decrypt the file again.
It is required. Obviously you need all the inputs to ChaCha to get the byte stream again, to decrypt the message.
Now sometimes the IV is known from the protocol. So say you're using ChaCha to encrypt network traffic. You might set the IV equal to the packet number. So both sides already know the packet number.
But you always need the IV to decrypt.
> and from my perspective I'm left with no clearer of anuidea about why IVs shouldn't be considered secret,
Consider again ChaCha20 as a blackbox. Key+IV goes in, stream of bytes comes out. There's no way to reverse that without the key (and IV). Since the attacker doesn't know the Key, they can't reverse it. Knowing the IV doesn't help.
Another way to think about it is that, instead of accepting a 256-bit key and a 64-bit IV, it's really just a 320-bit key. Knowing 64-bits of a 320-bit key doesn't help break a cipher. The cipher is still 256-bits strong. So you can share the IV without affecting security.
BIG NOTE: It's important that an IV is always unique. If an IV is ever re-used, the above attacks become available again because R will be the same for two messages.
Hope that helps. This is only one way that IVs are used. In ChaCha20 it's called a nonce, because ChaCha20 is geared towards usage on network protocols where the above trick of using packet number is applied. For block ciphers there are various cipher modes that get used, and most of them need an IV. The purpose is always the same; to make this "session" of encryption unique.
There's another way to use IVs, and I think they re-affirm the concept of what an IV actually is. Let's say you have a cipher that only accepts a key! No IV (like AES). You still want to make your encryption sessions unique. A way to do that is this:
And then use TempKey. HMAC is a form of hash. In this case it lets us combine a Key and IV in an irreversible way, yielding a new key. TempKey will be the right size key for the cipher (say, 256-bits). What this is doing is giving us a unique key for every encryption session. And that's the heart of IVs. And in many ways, ChaCha20 is doing exactly that. It's hashing together Key and IV and using the output hash to generate a long stream of random data that can't be reversed back to the key+IV.TempKey = HMAC (IV, Key)(and in case you're wonder, yes, you can use a cryptographically secure hash function alone to build a stream cipher like ChaCha. It'll just be _really_ _really_ slow, because hash functions are really, really slow compared to ChaCha.)
Thanks for spending the time explaining this.
Happy to.
Is this something people find interesting? I was thinking of doing a small guide/tutorial/course where I teach these basics of cryptography, while building up to a working file encryption tool written completely from scratch. Probably such a thing exists already, but /shrugs these kinds of questions always seem to come up.
This is risky. Unless you are a pro, it is generally not a great idea to publish a "how-to" for crypto because of the risk you might get it wrong in subtle ways that now propagate through the ecosystem.
I know all the subtle things.
That said, I wouldn't write the course for people intending to become cryptographers or cryptographic engineers. That would require a university grade program. It would be geared towards people who have a curiosity of the inner machinations of encryption. Ya know, like the people who come to Hacker News, read articles like this, and ask questions.
Note that AES uses IVs in CBC mode. It is incorrect to say that AES does not use IVs.
CBC mode is not specific to AES.
EDIT: To elaborate. AES does _not_ use IVs. It has no support for them whatsoever. It takes as input only a key and plaintext/ciphertext. This is in contrast to other block ciphers like Threefish which do.
AES, in most applications, has to be used in constructions that require IVs. But that's distinctly different. IVs are bolted onto AES.
"It is incorrect to say that AES does not use IVs." is patently false. It'd be like saying "It is incorrect to say that AES does not use tweaks."
In addition to fpgaminer's excellent explanation, I highly recommend the book "Cryptography Engineering: Design Principles and Practical Applications" by Niels Ferguson, Bruce Schneier, and Tadayoshi Kohno. It's an excellent overview of how to use crypto primitives and why to use them that way.
I know you're not just looking for answers but a pointer to some better documentation, and I can't provide you with those, but:
> why IVs shouldn't be considered secret
The least is considered secret, the least can be leaked and cause problems.
> why the IV isn't required to be able to decrypt the file again
The IV is required to decrypt the file again. In the linked document's design the IV is actually the encryption key, which means it is known by the receiver, which is why it's not included. But that is just a special case that should never be reproduced.
Agreed. Where is the pointer to the correct article to use when encrypting a file in C#.
Note: All my information re: Microsoft is from no later than 2013.
This is indicative of a classic challenge in the industry.
To ship code that uses crypto at Microsoft you have to go through an auditing process. To ship code that uses novel crypto, or works directly with crypto primitives, you have to be reviewed by a specialist crypto review board — that contains security and crypto people from across the company, names that you might know (e.g. Niels Ferguson was there last time I needed a review. Hi Niels!)
Samples and documentation aren't held to the same standard.
Microsoft have already 404d the article: https://support.microsoft.com/en-us/help/307010
Luckily we have a snapshot: https://web.archive.org/web/20170327154501/https://support.m...
Also now dead - I just get a blank page apart from the header and footer.
I got the same at first, seems some script removes it though. I see the content flashing by, i saved the content using wget and got the original page.
If you disable JavaScript you can see the content, it seems like there's some script that replaces it on page load.
wat? why? they have some sort of 'message will self-destruct javascript' in their page which is carried into the wayback machine?
I still can access it just fine: https://web.archive.org/web/20170327154501/https://support.m...
(I have javascript disabled anyway)
However the article is:
>Article ID: 307010 - Last Review: Nov 15, 2012 - Revision: 1
>Applies to Microsoft Visual C# 2005, Microsoft Visual C# .NET 2003 Standard Edition, Microsoft Visual C# .NET 2002 Standard Edition
So it seems like a bit out of date anyway, having been originally devised for 2002-2005 products.
Here's the original Microsoft article:
https://gist.github.com/mikemaccana/badf6c16f203e05c02b42f93...
(disabled JS in DevTools, caught it from archive.org before JS to wipe it kicked in)
As someone in charge of reviewing all crypto code for a sizable chunk of my company, I've yet to see a single case of someone using encryption primitives correctly by naive developers. To tell the truth, I don't think I've ever seen a single example of IVs used correctly.
At the very best of times I get AES-CBC-HMAC-SHA1 (usually Encrypt-AND-MAC) with binary keys and secret static IV.
I'm still waiting for the developer that will botch AES-GCM with a random nonce so I can have first world problems, but we're not there yet.
I wanted to call Microsoft sneaky for pulling out this article, but considering basically every top-ranked "how do I encrypt with AES" question on StackOverflow is full of bad advice, I'm glad they at least did something.
The article says that DES "can be brute forced in a single digit number of days by a modern computer".
2**56 keys / 9 days ≈ 92.7 Gkeys/s
Can modern computers actually compute DES that fast?This benchmark [1] gives 196.2 GH/s for DES using 8 Nvidia GTX 1080 Ti and Hashcat 3.5. So while your average computer is probably not quite sufficient it is certainly in reach.
[1] https://gist.github.com/epixoip/ace60d09981be09544fdd3500505...
Here's a project that did 1.4G/s on a single GPU five years ago:
https://www.reddit.com/r/crypto/comments/162ufx/research_pro...
Stick multiple modern GPUs in a machine and single digit days seems feasible.
Another version of essentially the same article is still live here:
https://support.microsoft.com/en-us/help/301070/how-to-encry...
Yep, all over the place:
https://searchcode.com/?q=ASCIIEncoding.ASCII.GetBytes%28sKe...
EDIT: ok maybe not "all over the place", but it's been done.
The article author is complaining about an MSDN article not being updated. The content even says it applies to VS 2005 at its highest. That's a hint of how old it is. Is he going to get the print version and complain about that next. If programmers are using this without thought that is on them not the example code.
Raymond Chen wrote some time ago about the variable quality of MS Knowledge Base articles: https://blogs.msdn.microsoft.com/oldnewthing/20060424-21/?p=...
That's pretty disturbing. Though to be fair, the article in question was written a while ago (since it targets .NET 2005), and to be less fair, MS doesn't really review their documentation very well, at all.
probably you are too youn, in the past when internet wasn't so ubiquitus, having a MSDN cd documentation was a live saver. the docs that today have serius content directly descend from that days, the res, as other already said, are just boilerplate autogenerated docs., which nobody maintains anymore because simply the technology is too fast. so probably this doc page abaut usage of DES is directly from 1990 or so... and in that days probably was good enough
It does seem useless to make the IV the same as the key. But is there a reason making the IV the same as the key is worse than using 0 as an IV?
Just asking.
To encrypt: tar cz foo | openssl aes-256-cbc -salt -out foo.enc
To decrypt: openssl aes-256-cbc -d -in foo.enc | tar xz
(foo can be a file or directory)
This does not contain a MAC though, does it? Also why CBC? Why not CTR/GCM instead? And why AES256 instead of Chacha20-Poly1305 or some other modern AEAD?
What are the advantages of GCM over CBC? And whats wrong with AES256?
- GCM unlike CBC is an AEAD mode (has a MAC build-in)
- CBC needs padding, which when misused can lead to padding oracle attacks
- GCM allows for parallel encryption
> And whats wrong with AES256?
There are more modern, faster and better ciphers that are designed to not be vulnerable against many side-channel attacks that AES is difficult to protect against.
GCM has an integrity check built in, which is very useful in crypto.
https://crypto.stackexchange.com/questions/2310/what-is-the-...
https://crypto.stackexchange.com/questions/14747/gcm-vs-ctrh...
https://security.stackexchange.com/questions/33569/why-do-yo...
I feel disgusted after reading this. I wonder how many people applied the advices given by the original article because they made the bad decision to trust the official documentation by MS.
Oh, come on: whatever Microsoft's faults might be they have a very long track record, stretching back decades, of providing overall high quality documentation for developers.
Yes, there are errors. Yes, sometimes there is deeply misguided advice. But, on the whole, MSDN and its ilk has helped me far more often than it's hurt me.
Key point: compared with much other vendor and OSS documentation, Microsoft are absolutely streets ahead.
Most of the documentation is boilerplate. There's very little real content now and most of it is filler.
Here's an example of a new API they added recently (the first one I thought of): https://msdn.microsoft.com/en-us/library/windows/desktop/mt5...
It says things like this:
> Indicates that the data for the file should be obtained from a WIM file. On access, data is transparently extracted from the WIM file and provided to applications. If the file contents are modified, data is transparently decompressed and the file is restored to the same physical form it had if this API were not used.
This is "boilerplate" and "very little content" to you? What are you thinking of?
To be fair Win32 isn't terrible. Have you looked at the .Net docs?
I haven't to be honest, but at the same time I'm not sure what you expect to see as the summary here. Is there anything profound to say about two overloads that differ by an extra "millisecond" parameter?
You need to click on the overload of this method that you are interested in. This is just the top level "where do you want to go?" document.
It's not. There's a content about 11 screens down, not that you'd find it easily.
Again, GP just stated that these are local links, links to anchors on that very same page, explaining the overloads.
Are there too many? For this class, maybe. It's at least potentially worth discussing it. But the way the navigation works isn't "just scroll until something seems to fit".
The last time I really had to deal with a bad MSDN article was probably 2003 working on an ASP/VBScript application that used MSXML. I was in my last year of High School, working for a local bank doing things that were certainly above my pay grade, with extremely minimal support, in my glory days.
I remember getting 90% of the way through writing my application in VBScript and finding a piece of documentation about some XSD thing that I really needed to do to complete the tool, and that lots of people were reporting my similar issue, the support reply basically said, "get f'ed," this function works in JScript implementation of MSXML but not in VBScript.
Sorry! Hope you have hundreds of spare hours to learn a new language and port over your entire codebase, because we're not fixing it.
Every time since that I can remember I have ever referred to MSDN, I have found one post with my question, asked in clear terms that I could reach from a google search... posted four years ago, with one or more replies that are almost always very obviously wrong, from MS Certified Partner(TM).
Maybe some of their documentation is great! I have not had the fortune to encounter it.
While many open source projects have great documentation, and many others do not, the difference tends to be that if your Open Source project has bad documentation, or features that just plain don't work, you are free to read the source code and fix it yourself!
If I could down vote this twice I would. I can't tell you how much I've been on the phone with MS support to try and do basic things with their software but can't because a.) the only existing documentation is wrong or b.) there's no documentation. My company pays MS a lot of money to not be able to do basic things with its software.
MSDN really? I never use it. I always end up someplace else. The information is there somewhere, maybe. I don't use other vendors much to know but MSDN sucks for me majorly. Not a microsoft fan generally though so maybe too much pain this past 30 years for an objective view.
I always look at Microsoft in order to learn how not to do anything /s
>It’s a good thing the caesar shift isn’t available in their library or it would probably have ended up in this tutorial.
https://docs.python.org/2/library/codecs.html#python-specifi...
Python does rot13 :)
But that's in the codecs library, not a cryptography library.
To be fair that's not a tutorial on how to encrypt and decrypt a file, it's a reference on the possible encodings you can use for a string.