Settings

Theme

New Recovery Tool to help with CrowdStrike issue impacting Windows endpoints

techcommunity.microsoft.com

80 points by thejournalizer 2 years ago · 65 comments

Reader

etskinner 2 years ago

Really impressive that they got thru an entire develop, build, approval, and documentation process in just about 2 days. Not that any of those steps are extremely hard for this fix, but I'm always impressed when big corporations can move so fast

  • nerdjon 2 years ago

    I sympathize with the engineers, QA, and everyone involved in getting this out.

    I have to imagine it was a lot of long hours, and the testing was insane. The last thing I want to do is put this tool out and it somehow messes things up more.

    But glad it’s out. Hopefully it helps with the remaining machines and with any that are being problematic.

    • xinayder 2 years ago

      They probably got an exemption to fast track the release because this is a critical issue. I wouldn't expect testing to be so thorough for a release in 2 days. The exemption is more likely.

    • selykg 2 years ago

      I wrote and maintained a tool that was sent to users when something went wrong.

      Man was I always afraid and stressed that the tool meant to help users when they were already having a failure was also having a failure.

    • ffhhj 2 years ago

      Surveillance software is top priority of BigCo's nowadays. If they prove to be useful for governments they'll get softer antimonopoly measures.

  • Kwpolska 2 years ago

    To be fair, there isn't a whole lot of code there. I wouldn't be surprised if Microsoft had the WinPE generator written already for some other project.

  • beefnugs 2 years ago

    Actually they may not have a choice, since they have forced people to install their local windows with a Microsoft login, and tying bitlocker to this login, there is probably many situations out there that requires microsoft login supported winPE just to fix this

  • ssahoo 2 years ago

    When their bottom line and head is at stake, what were they supposed to do?

    • Bognar 2 years ago

      They could say "third party kernel modules are installed at your own risk" and provide the usual level of business hours support. CrowdStrike fucked up and Microsoft is helping its customers recover from CrowdStrike's fuckup.

      • gus_massa 2 years ago

        They made a special memory allocator for Windows 95 to avoid a crash caused by a bug in SimCity https://www.joelonsoftware.com/2000/05/24/strategy-letter-ii...

        They are not only backward compatible or bug compatible. They are others-person-bug compatible. It's the only way to prevent users thinking about switching to another OS.

        • HPsquared 2 years ago

          Reminds me of this famous post from Linus about being "bug-compatible".

          https://lkml.org/lkml/2012/12/23/75

          • jorvi 2 years ago

            One thing I’ve never understood about “kernel never breaks user space”.. doesn’t that completely atrophy the kernel, preventing it from ever having big rewrites or architectural changes? What if an initial implantation was terrible, and there are 100x performance improvements to be had by doing a breaking change?

            • capitainenemo 2 years ago

              Implement a new API for the better route, isolate the terrible code as much as possible, notify the users, deprecate it, and remove it or move it to a userspace shim after enough years had past and almost everyone was off it?

      • Arwill 2 years ago

        If anything, then events like this makes decision makers rethink if they really should run Windows everywhere. Why does a flight schedule display has to run Windows, for example? It might not be their fuckup, but they will lose users too, for sure.

      • ssahoo 2 years ago

        They recommend crowdstrike to customers. Now they are trying to at least skim some good will. Also bad a kernel module that can ruin the OS is partially their fault.

        • Plasmoid2000ad 2 years ago

          Microsoft competes directly CrowdStrike with Defender across multiple areas - I'm not sure they recommend them to customer over their own products at the cost of losing sales.

          I don't think Microsoft is realistically in a position to forbid other companies from writing kernel level modules, from an antitrust standpoint I would think that would land them under investigation(s)

        • concerned_user 2 years ago

          I also think Microsoft should be responsible, they gave the keys to sign the kernel driver so I expect that driver to at least be subject to regular testing and scrutiny not just when initial release was made.

          • Bognar 2 years ago

            They didn't "give the keys", they have a signing infrastructure that is meant to be used for validating organizational identity and origins of code. They have a quality checking system, but it's only required for certain levels of Microsoft backing. I think it used to be called the Windows Logo Program or something?

          • LASR 2 years ago

            Signing is meant only to verify the identity of the organization producing the signed artifact.

            It’s not meant to signify that it’s bug-free.

          • Kwpolska 2 years ago

            The issue was caused by a data file, Microsoft is not involved in signing or testing individual data files.

            • _flux 2 years ago

              The actual issue was with the signed code reading the data files that the data file update just brought to surface.

              But I don't think Microsoft verifies customer code, they might not even have access to it.

              • concerned_user 2 years ago

                You are right Microsoft are not checking the 3rd party code itself they are only running a lot of tests on the compiled code.

                There is a recent video now from a former Microsoft employee where he explains that those drivers that get WHQL certification are ran on test machines in stress conditions for some time, or at least that is how it used to be when he worked there.

                Since that process is probably quite slow to be able to push update within a couple hours Crowdstrike just bypassed the QA testing by injecting their own data files into the driver.

                • _flux 2 years ago

                  I guess Microsoft testing lacks fuzzing, then—as does Crowdstrike's.

        • tatersolid 2 years ago

          Microsoft does not “recommend CrowdStrike”. Microsoft actually sells its own competitor to CrowdStrike (Defender XDR).

        • SoftTalker 2 years ago

          That the OS needs a product like Crowdstrike in order to be safely used is also their fault.

  • phoe-krk 2 years ago

    > an entire develop, build, approval, and documentation process in just about 2 days

    ...on a weekend.

  • mkl95 2 years ago

    They are not claiming they built it themselves. This kind of tool could easily be an offshore job.

    • aaomidi 2 years ago

      Probably not doing that with this incident. But FBI/NSA is probably involved.

  • switch007 2 years ago

    > develop, build, approval, and documentation process

    Under the immense pressures, I'm sure one or two of the usual steps were missed or reduced (perhaps this is what you were insinuating?)

gnfargbl 2 years ago

Given the harm that Crowdstrike caused Microsoft here, it does seem like they missed an opportunity in not calling this tool Blue Falcon.

ComputerGuru 2 years ago

We released ours the same day as the mass crashes :)

https://x.com/mqudsi/status/1814367837940515098

  • stackskipton 2 years ago

    Congrats? Microsoft has higher quality assurance concerns since anything with their name on it means customers will come beating down their door for support if ANYTHING goes wrong even if it's not them.

  • dataflow 2 years ago

    Yours also says nothing about BitLocker...

rdtsc 2 years ago

It's interesting Microsoft is dealing with this. I wonder how they feel about CS? Can't imagine they are happy with them. So I would guess it's less of "let's work with our friends at CS" and more like "Those $#%!, they made a mess and we're left to clean it up".

I've already heard from multiple non-technical people presenting this as a "Microsoft problem". "Omg, did you hear what Microsoft just did to their customers?". I don't know if CS subtly pulling strings to look less guilty, but probably just happens by simple association "blue screen of death = Windows problem". Can't image Microsoft is too happy to take this kind of a reputational hit.

  • dbcurtis 2 years ago

    > but probably just happens by simple association "blue screen of death = Windows problem"

    This certainly happens. Before driver signing, an extremely common cause of BSODs was a page fault in the kernel caused by a driver bug that failed to lock down a page during I/O. Only if you had the hex codes of the various exceptions memorized would you be in a position to tell a driver-caused BSOD from some other cause. So.... "it must be Windows again". This was a powerful motivation for MSFT to start a driver validation lab that they forced vendors through.

    And then... you have OS/2 -- where they actually used more than two security rings. Kernel in ring 0, user space in ring 3, and drivers in ring 1. Now the kernel can properly blame the driver. But of course, that can't be ported to CPU's with only 2 security levels.

  • stefan_ 2 years ago

    Well there is at least one way which they should be dealing with it, which is to immediately revoke the current CrowdStroke kernel driver. Surely that thing can't be kept loaded ready to explode at the next malformed "channel update". God knows the vendor can't be trusted to ensure that.

  • hedora 2 years ago

    Yep; was at a restaurant yesterday, and it sounds like they got hit with the CloudStrike Linux outage a month or so ago.

    They had no idea the two were probably the same vendor.

NelsonMinar 2 years ago

This tool requires you physically plug in a UBS device and then touch the keyboard. One at a time. I can imagine it has to be this way but ouch, that is a lot of manual work. At least it's simple enough to train someone to do it.

  • hedora 2 years ago

    Now you've got me wondering about the pile of regulatory fail that leads a company to install cloudstrike for endpoint security, but also to ship kiosks with physically accessible, bootable USB ports.

ok123456 2 years ago

They should add CS Falcon to their malware definitions in Windows Defender. Crowdstrike has proved that its software is indistinguishable from malware.

Also, while they're at it, add Trellex.

  • qingcharles 2 years ago

    If you're running CrowdStrike I would think Windows Defender is probably disabled, no?

    • hedora 2 years ago

      They could push a windows update that nukes CrowdStrike and re-enables Windows Defender. I'm pretty sure they've done that sort of thing in the past.

bloopernova 2 years ago

Did anyone write a script to remove the file directly from VM disks, rather than booting the OS? Or does crowdstrike somehow prevent that solution?

  • baq 2 years ago

    I imagine having an unencrypted disk in 2024 can be most charitably called 'an oversight', so there's little point in attempting to deal with them. (Remember we're talking about boxes with crowdstrike installed...)

    • bloopernova 2 years ago

      Ahh, right. You'd need bitlocker keys. Although I wonder if the central key server could be queried to obtain each host's key?

      Also makes me wonder about a software configuration management system that operated on disks while the virtual hosts were powered down. With windows it feels like that'd be at least very difficult, but Linux could definitely be managed that way. Like an immutable operating system where changes can only come from the central controller, and the OS itself is written with that in mind. Dunno what benefit that might bring, but it's a fun mental excursion.

      • TiredOfLife 2 years ago

        And what OS with what security product do you think the central key server runs?

        • bloopernova 2 years ago

          Well, sure, the central key server will have been affected by this, but that's one VM to remediate/restore and would hopefully be done first. Or at least once people realize the key server is also down.

    • hedora 2 years ago

      Are there VM platforms that can encrypt disks without giving the host access to the disk? Sure, they could use TPM or something, but that doesn't solve the problem.

      Worst case, I imagine you could boot to the bootloader menu, then scrape the unwrapped bitlocker key from RAM.

      (I agree that the org that mandated cloudstrike would collectively lay an egg if they realized this was possible.)

jaredhallen 2 years ago

We were doing something similar with our SCCM boot drives. Boot off the stick, press F8 for cmd prompt, use manage-bde to unlock bitlocker, and delete the files from the cmd prompt.

mikemitchelldev 2 years ago

Very carefully worded blog post title.

andrewmcwatters 2 years ago

People have been talking about how this is a CrowdStrike issue, and such on Reddit, etc. But in my opinion, it's appalling that Windows can allow this to happen.

  • vesinisa 2 years ago

    CrowdStrike installs as an operating system driver. It becomes essentially a part of the operating system and can do literally anything it wants, and Microsoft can not do much anything about it.

    Going forward, I could foresee Microsoft requiring endpoint protection solution providers certify their QA processes to get signing. But staged rollouts and canary builds have already been an industry standard process long before CrowdStrike. There was no way Microsoft could have known that they were dealing with a company so incompetent as CrowdStrike to cause this to happen.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection