New Recovery Tool to help with CrowdStrike issue impacting Windows endpoints
techcommunity.microsoft.comReally impressive that they got thru an entire develop, build, approval, and documentation process in just about 2 days. Not that any of those steps are extremely hard for this fix, but I'm always impressed when big corporations can move so fast
I sympathize with the engineers, QA, and everyone involved in getting this out.
I have to imagine it was a lot of long hours, and the testing was insane. The last thing I want to do is put this tool out and it somehow messes things up more.
But glad it’s out. Hopefully it helps with the remaining machines and with any that are being problematic.
They probably got an exemption to fast track the release because this is a critical issue. I wouldn't expect testing to be so thorough for a release in 2 days. The exemption is more likely.
I wrote and maintained a tool that was sent to users when something went wrong.
Man was I always afraid and stressed that the tool meant to help users when they were already having a failure was also having a failure.
Surveillance software is top priority of BigCo's nowadays. If they prove to be useful for governments they'll get softer antimonopoly measures.
To be fair, there isn't a whole lot of code there. I wouldn't be surprised if Microsoft had the WinPE generator written already for some other project.
Yeah, WinPE media tools have been around for years. Here is an article from 2021 (although it has been a thing long before then):
https://learn.microsoft.com/en-us/windows-hardware/manufactu...
Still, customizing the toolchain to fit this particular scenario and making sure it works, in two days, is commendable effort.
Actually they may not have a choice, since they have forced people to install their local windows with a Microsoft login, and tying bitlocker to this login, there is probably many situations out there that requires microsoft login supported winPE just to fix this
When their bottom line and head is at stake, what were they supposed to do?
They could say "third party kernel modules are installed at your own risk" and provide the usual level of business hours support. CrowdStrike fucked up and Microsoft is helping its customers recover from CrowdStrike's fuckup.
They made a special memory allocator for Windows 95 to avoid a crash caused by a bug in SimCity https://www.joelonsoftware.com/2000/05/24/strategy-letter-ii...
They are not only backward compatible or bug compatible. They are others-person-bug compatible. It's the only way to prevent users thinking about switching to another OS.
Reminds me of this famous post from Linus about being "bug-compatible".
One thing I’ve never understood about “kernel never breaks user space”.. doesn’t that completely atrophy the kernel, preventing it from ever having big rewrites or architectural changes? What if an initial implantation was terrible, and there are 100x performance improvements to be had by doing a breaking change?
Implement a new API for the better route, isolate the terrible code as much as possible, notify the users, deprecate it, and remove it or move it to a userspace shim after enough years had past and almost everyone was off it?
That must be a pretty well-worn path by now.
If anything, then events like this makes decision makers rethink if they really should run Windows everywhere. Why does a flight schedule display has to run Windows, for example? It might not be their fuckup, but they will lose users too, for sure.
Same thing already happened on Linux, but it failed to make a big enough splash to make any headlines. Putting Windows at fault here is unfair.
https://www.newsbytesapp.com/news/science/before-affecting-w...
They recommend crowdstrike to customers. Now they are trying to at least skim some good will. Also bad a kernel module that can ruin the OS is partially their fault.
Microsoft competes directly CrowdStrike with Defender across multiple areas - I'm not sure they recommend them to customer over their own products at the cost of losing sales.
I don't think Microsoft is realistically in a position to forbid other companies from writing kernel level modules, from an antitrust standpoint I would think that would land them under investigation(s)
I also think Microsoft should be responsible, they gave the keys to sign the kernel driver so I expect that driver to at least be subject to regular testing and scrutiny not just when initial release was made.
They didn't "give the keys", they have a signing infrastructure that is meant to be used for validating organizational identity and origins of code. They have a quality checking system, but it's only required for certain levels of Microsoft backing. I think it used to be called the Windows Logo Program or something?
Signing is meant only to verify the identity of the organization producing the signed artifact.
It’s not meant to signify that it’s bug-free.
WHQL signifies it is tested and that driver is WHQL certified.
The issue was caused by a data file, Microsoft is not involved in signing or testing individual data files.
The actual issue was with the signed code reading the data files that the data file update just brought to surface.
But I don't think Microsoft verifies customer code, they might not even have access to it.
You are right Microsoft are not checking the 3rd party code itself they are only running a lot of tests on the compiled code.
There is a recent video now from a former Microsoft employee where he explains that those drivers that get WHQL certification are ran on test machines in stress conditions for some time, or at least that is how it used to be when he worked there.
Since that process is probably quite slow to be able to push update within a couple hours Crowdstrike just bypassed the QA testing by injecting their own data files into the driver.
I guess Microsoft testing lacks fuzzing, then—as does Crowdstrike's.
Microsoft does not “recommend CrowdStrike”. Microsoft actually sells its own competitor to CrowdStrike (Defender XDR).
That the OS needs a product like Crowdstrike in order to be safely used is also their fault.
> an entire develop, build, approval, and documentation process in just about 2 days
...on a weekend.
They are not claiming they built it themselves. This kind of tool could easily be an offshore job.
Probably not doing that with this incident. But FBI/NSA is probably involved.
> develop, build, approval, and documentation process
Under the immense pressures, I'm sure one or two of the usual steps were missed or reduced (perhaps this is what you were insinuating?)
Given the harm that Crowdstrike caused Microsoft here, it does seem like they missed an opportunity in not calling this tool Blue Falcon.
We released ours the same day as the mass crashes :)
Congrats? Microsoft has higher quality assurance concerns since anything with their name on it means customers will come beating down their door for support if ANYTHING goes wrong even if it's not them.
> Microsoft has higher quality assurance concerns...
No, they don't. This is the same company that has turned the Windows OS into an advertisement platform within the OS [0]. A company that puts buggy telemetry collection over their end users [1]. And a platform that is known to spy on its end users [2]. So, no - Microsoft really doesn't care about its end users with "higher quality assurance concerns". They care about turning a profit.
[0] https://www.theverge.com/2024/4/12/24128640/microsoft-window... [1] https://www.maketecheasier.com/fix-microsoft-compatibility-t... [2] https://www.techradar.com/news/is-windows-11-spying-on-you-n...
Which is completely irrelevant and does not negate parent comment's point.
A real argument would be very informative, but yours just ruined that and is not much different from trolling.
Did you go read the comments in the link above for the Microsoft tool? Because your comment indicates you didn't. I stand by what I said and it does showcase the level of quality Microsoft puts into their products today.
I can assure you the Windows advertisement platform goes through QA. You might need to think more about separating "what they do" from "how they do it".
While true you are talking about something entirely different.
Yours also says nothing about BitLocker...
It's interesting Microsoft is dealing with this. I wonder how they feel about CS? Can't imagine they are happy with them. So I would guess it's less of "let's work with our friends at CS" and more like "Those $#%!, they made a mess and we're left to clean it up".
I've already heard from multiple non-technical people presenting this as a "Microsoft problem". "Omg, did you hear what Microsoft just did to their customers?". I don't know if CS subtly pulling strings to look less guilty, but probably just happens by simple association "blue screen of death = Windows problem". Can't image Microsoft is too happy to take this kind of a reputational hit.
> but probably just happens by simple association "blue screen of death = Windows problem"
This certainly happens. Before driver signing, an extremely common cause of BSODs was a page fault in the kernel caused by a driver bug that failed to lock down a page during I/O. Only if you had the hex codes of the various exceptions memorized would you be in a position to tell a driver-caused BSOD from some other cause. So.... "it must be Windows again". This was a powerful motivation for MSFT to start a driver validation lab that they forced vendors through.
And then... you have OS/2 -- where they actually used more than two security rings. Kernel in ring 0, user space in ring 3, and drivers in ring 1. Now the kernel can properly blame the driver. But of course, that can't be ported to CPU's with only 2 security levels.
Well there is at least one way which they should be dealing with it, which is to immediately revoke the current CrowdStroke kernel driver. Surely that thing can't be kept loaded ready to explode at the next malformed "channel update". God knows the vendor can't be trusted to ensure that.
Yep; was at a restaurant yesterday, and it sounds like they got hit with the CloudStrike Linux outage a month or so ago.
They had no idea the two were probably the same vendor.
Sorry for the ignorance, but what is this Crowdstrike Linux outage you mention? Couldn't find any easily accessible news on it
This tool requires you physically plug in a UBS device and then touch the keyboard. One at a time. I can imagine it has to be this way but ouch, that is a lot of manual work. At least it's simple enough to train someone to do it.
Now you've got me wondering about the pile of regulatory fail that leads a company to install cloudstrike for endpoint security, but also to ship kiosks with physically accessible, bootable USB ports.
They should add CS Falcon to their malware definitions in Windows Defender. Crowdstrike has proved that its software is indistinguishable from malware.
Also, while they're at it, add Trellex.
If you're running CrowdStrike I would think Windows Defender is probably disabled, no?
They could push a windows update that nukes CrowdStrike and re-enables Windows Defender. I'm pretty sure they've done that sort of thing in the past.
That would require the system to actually boot. Which the CrowdStrike bug prevents.
Did anyone write a script to remove the file directly from VM disks, rather than booting the OS? Or does crowdstrike somehow prevent that solution?
I imagine having an unencrypted disk in 2024 can be most charitably called 'an oversight', so there's little point in attempting to deal with them. (Remember we're talking about boxes with crowdstrike installed...)
Ahh, right. You'd need bitlocker keys. Although I wonder if the central key server could be queried to obtain each host's key?
Also makes me wonder about a software configuration management system that operated on disks while the virtual hosts were powered down. With windows it feels like that'd be at least very difficult, but Linux could definitely be managed that way. Like an immutable operating system where changes can only come from the central controller, and the OS itself is written with that in mind. Dunno what benefit that might bring, but it's a fun mental excursion.
And what OS with what security product do you think the central key server runs?
Well, sure, the central key server will have been affected by this, but that's one VM to remediate/restore and would hopefully be done first. Or at least once people realize the key server is also down.
Are there VM platforms that can encrypt disks without giving the host access to the disk? Sure, they could use TPM or something, but that doesn't solve the problem.
Worst case, I imagine you could boot to the bootloader menu, then scrape the unwrapped bitlocker key from RAM.
(I agree that the org that mandated cloudstrike would collectively lay an egg if they realized this was possible.)
We were doing something similar with our SCCM boot drives. Boot off the stick, press F8 for cmd prompt, use manage-bde to unlock bitlocker, and delete the files from the cmd prompt.
Very carefully worded blog post title.
People have been talking about how this is a CrowdStrike issue, and such on Reddit, etc. But in my opinion, it's appalling that Windows can allow this to happen.
CrowdStrike installs as an operating system driver. It becomes essentially a part of the operating system and can do literally anything it wants, and Microsoft can not do much anything about it.
Going forward, I could foresee Microsoft requiring endpoint protection solution providers certify their QA processes to get signing. But staged rollouts and canary builds have already been an industry standard process long before CrowdStrike. There was no way Microsoft could have known that they were dealing with a company so incompetent as CrowdStrike to cause this to happen.