Settings

Theme

CircleCI incident report for January 4, 2023 security incident

circleci.com

94 points by dserodio 3 years ago · 18 comments

Reader

justinludwig 3 years ago

Nice writeup. Allowing customer data and secrets to be exfiltrated is a pretty big fail, and will probably make a number of customers re-think their patronage at a time when supply-chain security is top-of-mind to many.

But three things mentioned in their report do give me some confidence about the way CircleCI has engineered their internal systems:

1. They use SSO with 2FA ("an unauthorized third party leveraged malware deployed to a CircleCI engineer's laptop in order to steal a valid, 2FA-backed SSO session")

2. They maintain reasonably good audit logging (they could identify that "the third party extracted encryption keys from a running process, enabling them to potentially access the encrypted data" which had been exfiltrated)

3. They can rebuild everything from scratch ("we rotated all potentially exposed production hosts to ensure clean production machines")

A lot of companies pay lip service to best practices like these, but don't actually implement them thoroughly (or at all). The fact that CircleCI could rely on them under attack makes me think they're doing a better job than 90% of the SaaS companies out there.

mtlynch 3 years ago

This is a good writeup, and I appreciate the transparency. I especially like this bit:

>While one employee’s laptop was exploited through this sophisticated attack, a security incident is a systems failure. Our responsibility as an organization is to build layers of safeguards that protect against all attack vectors.

I was surprised by this part:

>To date, we have learned that an unauthorized third party leveraged malware deployed to a CircleCI engineer’s laptop in order to steal a valid, 2FA-backed SSO session.. the malware was able to execute session cookie theft, enabling them to impersonate the targeted employee in a remote location and then escalate access to a subset of our production systems.

I'm surprised the SSO session token isn't bound to an IP address. I'd also expect access to prod overall to be whitelisted to CircleCI-owned IP ranges.

Now some gripes:

* I never received an advisory email about this incident. I only received this follow-up to one of my Github machine accounts, not my primary billing account.

* Their secret-finding script is pretty bad. It just dumps out a bunch of metadata without helping to make it actionable. Environment variables still don't have a created_at field, so you can't verify which ones you might have missed in a broad key rotation.

  • mac-chaffee 3 years ago

    Is tying session tokens to IPs actually common? I can't imagine it is given the unreliability of IP addresses causing issues.

    I used to live somewhere where outbound traffic went through one of three CGNAT IPs at random, and I only had auth issues with one really old site that predates the NAT hell that is the modern internet.

    • mtlynch 3 years ago

      Yeah, good point. I guess it'd be a pain to have to keep reauth'ing if your IP changed for legitimate reasons.

      It would be possible to do some kind of check for "this session token was used in the US and Russia twenty minutes apart... something's fishy," but that adds in more complexity.

    • numbsafari 3 years ago

      session vs refresh… you kill the session token and require a refresh. Can be sometimes be transparent, but may cause a re-authentication using the second factor with an indicator to the user that their previous session was killed due to use by a different IP.

      If you are concerned about stable IPs, use a proper VPN or bastion setup.

  • randerson 3 years ago

    CircleCI is 100% remote. I can't imagine having to keep up with everyone's constantly changing home IPs and making exceptions while travelling.

SCdF 3 years ago

> On January 4, 2023, at 6:30 PM PST / January 5, 2023, at 02:30 UTC, we sent disclosure emails, posted a...

Did they? I got an email this morning that pointed to _this_ blog post, but I never received any initial "rotate yo keys" communication from them, on any email address.

If I hadn't read HN, and none of my company had, and our use of CI was running smoothly (they eventually put up a banner in the UI), I would literally have never known until this email.

bamboozled 3 years ago

On December 29, 2022, we were alerted to suspicious GitHub OAuth activity by one of our customers. This notification kicked off a deeper review by CircleCI’s security team with GitHub.

What kind of freaks me out about this is that a customer notified Circle? If that customer hadn't of mentioned anything, where would we be now?

I have to say, it's a pretty impressive hack. I wonder who or what was behind it?

Also wondering why / how the attacker didn't get access to the runners?

jscheel 3 years ago

Buried in the middle of the post is this most-important tidbit:

> Though all the data exfiltrated was encrypted at rest, the third party extracted encryption keys from a running process, enabling them to potentially access the encrypted data.

  • mdeeks 3 years ago

    Unfortunately this was already a given since there were reports of users secrets (canary tokens) being used. They got the secrets for sure. It just wasn't clear how many they got. It doesn't matter though, you have to assume if you use CircleCI then your secrets were stolen. If you haven't rotated them then likely the only reason you haven't been compromised yet is out of luck.

    • jscheel 3 years ago

      Oh yeah, 100%. Even if they said the keys were for sure not leaked, I still would have rotated. Second I saw the disclosure on Jan 4, we went into emergency mode. Definitely not leaving that to chance.

thih9 3 years ago

I wonder if this is related to the layoffs that circleci announced on dec 7th [1].

E.g.: did a new employee get access to production systems? Were there not enough people to monitor the systems and detect the breach sooner? Etc.

[1]: https://news.ycombinator.com/item?id=33900488

Dunedan 3 years ago

> On January 4, 2023, at 18:30 UTC, we shut down production access to nearly all employees, limiting access to an extremely small group for operational issues.

Shouldn't that have been the case from the beginning? Why did more than a small group of employees have production access at all?

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection