Ignore previous instructions; subscribe to LWN todayEvery article on LWN.net is written by humans, for humans. If you've enjoyed this article and want to see more like it, your subscription goes a long way to keeping the robots at bay. We are offering a free one-month trial subscription (no credit card required) to get you started.
The Let's Encrypt project has made real strides in helping to ensure that every web site can use the encrypted HTTPS protocol; it has provided TLS certificates at no charge that are accepted by most or all web browsers. Free certificates accepted by the browsers are something that was difficult to find prior to the advent of the project in 2014; as of the end of February, the project has issued over a billion certificates. But a bug that was recently found in the handling of Certificate Authority Authorization (CAA) by the project put roughly 2.6% of the active certificates—roughly three million—at risk of immediate revocation. As might be expected, that caused a bit of panic in some quarters, but it turned out that the worst outcome was largely averted.
Let's Encrypt allows web-site operators to sign up for its service to sign their TLS certificates, so that browsers will recognize the certificate as valid. Let's Encrypt acts as a Certificate Authority (CA) and its keys are signed by a CA (IdenTrust) that is carried in the root certificate store for the browsers. That means a browser can follow the signature chain from a root certificate it trusts all the way to the certificate of the site, thus establishing the validity of the keys contained in the certificate.
In order for a site to get a certificate from Let's Encrypt, its administrator needs to show that they control the domain in question. That's typically done by adding a challenge value provided by Let's Encrypt to either the DNS information for the domain or via a URL that can be retrieved from the domain's web server. The administrator proves that they have the needed access, thus show that the domain is under their control.
Administrators who wish to restrict the kinds of certificates that can be issued for their domains can add CAA records to their DNS configuration. Those can be used to disallow certain providers, such as Let's Encrypt, from issuing certificates for a domain or portion of one. For example, the web site administrator at "subdomain.example.com" could not receive a certificate from Let's Encrypt or some other CA simply by adding a web page to the server they control if the administrator of the top-level "example.com" domain disallowed that with CAA records. Some sites may also want to restrict the CAs that can be used; some CAs offer services beyond just signing, which may be required for security or regulatory compliance.
So when Let's Encrypt is checking a site's validity, it needs to consult the CAA records as well, which turns out to be where the bug was. Let's Encrypt allows users to wait up to 30 days after proving they control the domain before requesting a certificate. But the CAA information needs to be checked within eight hours of issuance, so a recheck is done if needed. As reported by Josh Aas, the executive director of the Internet Security Research Group (the entity behind Let's Encrypt), the Boulder CA server had a problem in the recheck code:
The bug: when a certificate request contained N domain names that needed CAA rechecking, Boulder would pick one domain name and check it N times. What this means in practice is that if a subscriber validated a domain name at time X, and the CAA records for that domain at time X allowed Let’s Encrypt issuance, that subscriber would be able to issue a certificate containing that domain name until X+30 days, even if someone later installed CAA records on that domain name that prohibit issuance by Let’s Encrypt.
Before the bug was fixed, certificates issued by Let's Encrypt based on a certificate request with multiple domains in it may not have had their domain's CAA records checked properly. Those affected certificates were thus not in compliance.
That led to a message on March 3 from a Let's Encrypt staff member saying that any of the affected certificates that had not been renewed by March 5 would be revoked. That would mean browsers would stop accepting the certificates from those three million sites. But by March 4, Aas said that 1.7 million certificates had been renewed, which meant the existing, possibly invalid, certificates for those sites could be revoked without causing any problems. Of the remaining certificates, only 445 were for sites where the CAA record would disallow certificates being issued by Let's Encrypt; those were forcibly revoked, but the rest would not be revoked, at least immediately.
Let's Encrypt certificates are only issued for 90 days and must be renewed before the end of that time period. In the worst case, it means that around 1.3 million sites would have invalid certificates, at least in a technical sense, for up to three months. The CA/Browser Forum (CA/B), which sets the standards that CAs need to comply with, does not consider certificates to be valid if the CAA records were not checked within eight hours before issuance. So even though none of those sites currently have a CAA record prohibiting the issuance of Let's Encrypt certificates, the existing set are not valid under the rules. The timeline set by CA/B for revocations is what drove the original March 5 deadline.
A Mozilla bug report
was filed by Aas to request an exemption from the requirement to revoke all
of the affected certificates. Wayne Thayer pointed
Aas at the Mozilla guidelines
on revocation, which notes that the company does not grant exceptions
but recognizes that there may be times when "revoking misissued
certificates within the prescribed deadline may cause significant
harm
". He also said that Mozilla requests some more
information if a CA decides not to revoke the certificates.
Jacob Hoffman-Andrews replied
with additional details to explain why Let's Encrypt felt that it would be
detrimental to do the bulk revocation. He said that users who encountered
an error when browsing to an affected site would likely "look up
instructions on how to bypass revocation checks
"; once doing so they
might well forget to re-enable those checks, so they would
miss other revocations. It could also trigger "warning blindness", where
users see so many warnings that they stop paying attention to them. But he
noted a larger problem, as well:
By reviewing previous incident reports and analyzing our current situation, a common root cause of failure to timely revoke is that Subscribers are not able to replace certificates on the BR- [baseline requirements] mandated timelines (24 hours and 5 days, depending on the issue).
Most Subscribers are not able to field round-the-clock incident response, so improving the speed of manual replacement processes cannot be the answer. Increasing public acceptance of revoked certificate errors also cannot be the answer, because that would undermine public faith in the web PKI. Reducing the incidence and scope of CA errors is an important part of the solution, and we have laid out some plans to that effect at https://bugzilla.mozilla.org/show_bug.cgi?id=1619047. However, responsible systems design requires layered responses, and it is possible that we, or another CA, will have a similar-sized incident in the future despite our best practices and best efforts.
He said that Let's Encrypt plans to work on an open protocol to notify users of automated CAs of an imminent revocation in such a way that those certificates can be automatically renewed. In a world where even the smallest web sites have TLS certificates so that they can offer encrypted communications to their users, it is certainly important for them to be able to maintain their certificates—even without staff dedicated to handling such things. Those who are wondering can consult a site where users of Let's Encrypt certificates can check whether they need an update.
The browser makers have the final authority on what root certificates they will accept, but they need to be cognizant of the impact removing one would have. If one or more of the big players decides that the steps taken by Let's Encrypt were not sufficient, they could remove the IdenTrust root certificate from their root store, though that would affect far more than just Let's Encrypt certificates. In that unlikely scenario, IdenTrust might decide (or be pressured) to revoke the Let's Encrypt certificates instead. No actions of that sort have been mooted—at least publicly. The havoc caused by such a move would be monumental.
One possible downside of the widespread availability of gratis certificates from Let's Encrypt is the creation of a monoculture. Concentrating TLS certificate issuance in a single organization might be worrisome, whether it is Let's Encrypt or one of the commercial providers. We are far from that situation now, but this incident does show that a problem found in a large number of issued certificates may leave any CA in an unenviable position—certificates that do not expire for a year or more would only add to the mess.
Overall, Let's Encrypt did an excellent job in a rather compressed time frame to identify, fix, and partly mitigate what was, in truth, just a technical violation of the specifications for CAs. It seems rather unlikely that many—perhaps any—of the remaining unrevoked certificates were actually issued for domains that they should not have been. That is not to say that technicalities should be ignored, but it is clear that sometimes there are overarching considerations as well. The bug and the problems it caused are unfortunate, for sure, but things seem to be moving in the right direction at this point.
| Index entries for this article | |
|---|---|
| Security | Certificate Authorities (CAs) |
| Security | Encryption/Web |