Windows 11 Update KB5063878 Causing SSD Failures
old.reddit.comThe latest as far as I know is that Phison couldn't replicate the issue. [1]
[1] https://wccftech.com/phison-dismisses-reports-of-windows-11-...
But what's actually happening? There seems to be a lack of technical information.
And why does the SSD allow this to happen? A SSD has its own onboard computer, it's not just allowing the OS to do whatever it wants. Obviously the OS can write way too much and reach the endurance limit but that should have been figured out almost instantly, with OS write stats and SMART stats.
> And why does the SSD allow this to happen? A SSD has its own onboard computer, it's not just allowing the OS to do whatever it wants.
If the device is DRAM-less, much of its central information (large parts of the FTL, in particular) resides in the host's RAM, where the OS could presumably touch it. If that area of RAM is _somehow_ being overwritten or out-of-sync or otherwise unreliable, you can get pretty bad corruption.
no, the FTL is still in the SSD unless it's a host-managed SSD which is also operating in host-managed mode, which none of the articles have mentioned to be related to the issue
No, some SSDs use host memory buffer (HMB) to cache FTL tables. If the FTL cache gets corrupted, and that causes critical data to be overwritten, that could brick the SSD. For instance, if the FTL table was corrupted in such a way where a page for a random file is mapped to the page for the SSD's FTL (or other critical data), and the OS/user tries to write to that random file.
Isn't that a huge flaw?
Yes, which is why they're cheap(er). It's better than the alternative of using flash instead of going out to system RAM, but DRAM-less SSDs are still the cheap option; HMB is a mitigation, and not a complete fix.
The FTL executes on the SSD controller, which (on a DRAM-less controller) has limited on-chip SRAM and no DRAM. In contrast, a controller for more expensive SSDs which will require an external on-SSD DRAM chip of 1+GB.
The FTL algorithm still needs one or more large tables. The driver allocates host-side memory for these tables, and the CPU on the SSD that runs the FTL has to reach out over the PCIe bus (e.g. using DMA operations) to write or read these tables.
It's an abomination that wouldn't exist in an ideal world, but in that same ideal world people wouldn't buy a crappy product because it's $5 cheaper.
One of the Japanese sites has a list of SSDs that people have observed the problem on - most of them seem to be dramless, especially if "Phison PS5012-E12" is an error. (PS5012-E12S is the dramless version)
Then again, I think dramless SSDs represent a large fraction of the consumer SSD market, so they'd probably be well-represented no matter what causes the issue.
Finally, I'll point out that there's a lot of nonsense about DRAMless SSDs on the internet - e.g. Google shows this snippet from r/hardware: "Top answer: DRAM on the drive benefits writes, not reads. Gaming is extremely read-heavy, and reads are..."
FTL stands for flash TRANSLATION layer - it needs to translate from a logical disk address to a real location on the flash chip, and every time you write a logical block that real location changes, because you can't overwrite data in flash. (you have to wait and then erase a huge group of blocks - i.e. garbage collection)
If you put the translation table in on-SSD DRAM, it's real fast, but gets huge for a modern SSD (1+GB per TB of SSD). If you put all of it on flash - well, that's one reason thumb drives are so slow. I believe most DRAM-full consumer SSDs nowadays keep their translation tables in flash, but use a bunch of DRAM to cache as much as they can, and use the rest of their DRAM for write buffering.
DRAMless controllers put those tables in host memory, although I'd bet they still treat it as a cache and put the full table in flash. I can't imagine them using it as a write buffer; instead I'm guessing when they DMA a block from the host, they buffer 512B or so on-chip to compute ECC, then send those chunks directly to the flash chips.
There's a lot of guesswork here - I don't have engineering-level access to SSD vendors, and it's been a decade since I've put a logic analyzer on an SSD and done any reverse-engineering; SSDs are far more complicated today. If anyone has some hard facts they can share, I'd appreciate it.
I dont buy this. There are plenty of dramless SATA SSDs which should be impossible if your description was correct, not to mention DRAMless drives working just fine inside USB-NVME enclosures.
>but gets huge for a modern SSD (1+GB per TB of SSD)
except most drives allocate 64MB thru HMB. Do you know of any NVME drives that steal Gigabytes of ram? Afaik Windows limits HMB to ~200MB?
>Finally, I'll point out that there's a lot of nonsense about DRAMless SSDs on the internet
FTL doesnt need all that ram. Ram on drives _is_ used for caching writes, or more specifically reordering and grouping small writes to efficiently fill whole NAND pages preventing fragmentation that destroys endurance and write speed.
but isn’t it the case that SATA devices must receive AT commands to the disk controller while NVMe is mapped directly to the CPU?
Surely that distinction would make one more vulnerable to corruption than the other?
Are you talking about the fact that NVMe works by MMIO and DMA? So is pretty much any SATA controller, so there's no inherent difference there (there are _many_ years since the dominant way of talking to devices was through programmed I/O ports). Unless you have a NVM device with host-backed memory (as discussed elsewhere in the thread), it's not like the CPU can just go and poke freely at the flash, just as it cannot overwrite a SATA disk's internal RAM or forcefully rotate its platters. It can talk to the controller by placing commands and data in a special shared memory area, but the controller is fundamentally its own device with separate resources.
It is not published yet on the Microsoft update page (https://support.microsoft.com/KB/5063878). And it only applies to Windows 11 24 H2.
https://learn.microsoft.com/en-us/answers/questions/5536733/...
>But what's actually happening? There seems to be a lack of technical information.
That's also what I want to know. All the information on this topic seems to be just circular anecdotes like a snake eating its own tail: a bunch of anecdotal reddit posts, quoting a Tom's hardware article, that's quoting more anecdotal reddit posts, that's quoting one Japanese tweet of someone's speculation.
Like how many of these SSD deaths can actually be pinned on this update, and how much of this is just "Havana syndrome" of people's SSDs dying for whatever other reason, then they hear about this hubbub in the news and then they go on reddit and say "OMG mine too", then clickbait journalists pick up on it, and round and round we go, further reinforcing the FUD, but without any actual technical analysis to verify.
Agree; any truth to the fact that this is push back for Windows 10 EOL?
Right. It could just be the usual suspects of misinformation (Reddit, click-hungry "journalists", certain YouTube/Tiktok creators) amplifying each other in a circle. Just like that "16 billion passwords data leak" earlier this year.
There is probably something going on. It could very well just be a bad batch of SSD controllers from one manufacturer failing.
Or some weird conspiracy
> But what's actually happening?
Publications need clicks, videos need watches, people need upvotes
"I installed a Windows update and my SSD died afterwards" doesn't seem like news, given that almost all Windows users periodically install Windows updates and SSDs sometimes fail.
Runaway processes are big problems for SSD life. A runaway file indexer, or a tool which re-writes large chunks of data can consume the TBW limit of an SSD pretty fast if it's left unchecked for long.
I seem to remember Spotify causing big problems because of this
This is doing the rounds on YouTube, too. But with pretty much the same information as everywhere else that tracks back to the same original sources.
* https://youtube.com/watch?v=mlY2QjP_-9s (JayzTwoCents)
* https://youtube.com/watch?v=sU_WepeHUd8 (ThioJoe)
* https://youtube.com/watch?v=7xS-CE-hy6Q (Dave's Attic)
* https://youtube.com/watch?v=zoHGSz-f6os (Pureinfotech)
Is it actually killing the SSD (SSD can no longer be used) or just corrupting the data on the SSD? It's hard to make out from all the comments and news articles.
I've seen lots of SSDs die suddenly (no longer visible on the bus), so I would assume that is what is happening based on the words people are using. I've yet to see an SSD fail to read only mode like they're supposed to... and there's rarely any warning, just working or dead (although I did have a couple that went from working to terribly slow while doing a large reallocation, and we replaced those rather than find out what would happen over a longer term)
That said, people use words with a different meaning all the time, and data corruption could fit as a failure.
Failing to read-only is only an Intel thing, I've not seen any other SSD do that...
I've not seen an Intel SSD do it either, although I've seen many of them escape their earthly existence :P
There was a firmware bug, but updating the firmware was inconvenient, and the specific interaction that caused the failure wasn't stated, so I couldn't avoid whatever it was; seemed connected to being pretty idle... we had a second data center as an untested "warm" failover target, and disks would tend to die over there where nothing significant was happening.
I had a Crucial drive fail to read-only.
I got the data off, but most of the data wasn't really that important so there might have been dead regions.
I feel that many consumers won't really know if it's still readable, I'd suggest that 90% of people just have a single drive, and windows doesn't cope with a non-writable root drive particularly well.
"Just" corrupting your filesystem...
Relative seriousness, both drive damage and filesystem damage are both bad but by slightly different degrees.
There is more chance of being able to fix data corruption, than being able to fix a bricked drive or one with unbearable blocks.
Self rely as I'm too late to edit out a slide-keyboard error: unbearable -> unreadable
some data might be worth way more than any SSD.
If it is then storing it without backups sounds like a bad idea
I wonder what the commercial effect is of such a thing on MS. Because assuming that the SSDs are unrecoverable it might lead to sales of new machines or new Windows licenses. There is a fair chance that bugs like these end up making good money, the numbers are large enough that even a small fraction of the users being affected can translate into a serious windfall.
They should be held liable if their software bricks hardware.
Can you get Linus Bucks by replacing your efivars with Doom?
You seem to be implying that Linus Thorvalds should also be liable for damage caused by Linux kernel.
I don't think the analogy is good. You might be better off replacing Linus with Apple and Linux with macOS. In that case, I would definitely think Apple should be held liable if an update to macOS bricks some hardware in a Mac.
But with Linux, it is different: You do not have a business relationship with Linus.
Sure, if you bought your Linux distribution from, say Red Hat, and it bricks your server, I think you might have a good case against Red Hat(IBM).
You took my reply a little too seriously :-)
Torvalds* but I'm sure he'd not mind the extra H :)
We knew more technical information about the CrowdStrike than we know about this. It's ridiculous.
The biggest problem with this is near zero communications from Microsoft. But what do I expect these days? Shovel AI in everything at any cost.
I’ve had repeatable data loss recently from windows 11 under a specific condition copying directories in explorer. The case works on windows 10 LTSC fine. I have absolutely no idea where to even raise this as an issue now. I’m not sure I even give a fuck.
Wasn't this mostly a WD HBM issue? [1]
[1] https://www.neowin.net/news/report-microsofts-latest-windows...
Some more information here https://www.windowslatest.com/2025/08/20/microsoft-is-invest...
That's why I don't install updates, unless and until they've been proven not to break things. I miss the old days when software was expected to work out of the box and updates, on the rare occasions when they appeared, were actually useful.
I hope you are speaking with tongue in cheek. Security is the main reason to keep current with updates. They address various “CVE” reports and go beyond to patch things not reported by CVEs.
I think users wouldn't be so resistant to security updates of they were just that and not bundled with feature removal, unwanted new features, and other things.
Or if they were properly done. Example: Intel and the plundervolt vulnerability. To fix that they removed the ability for undervolting in ny laptop. If I don't use SGX there's no reason for the block. They could've restricted undervolting only when SGX is enabled but no, they had to "fix" it in the worst way possible.
CVE inflation is real. Most CVEs are of very low quality.
Anyway, security updates should be decoupled from feature updates, so that people aren't hesitant to update. Otherwise, you get people who hold out because they're worried the new release is going to break all their settings and "opt-in" into all kinds of new telemetry.
> Security is the main reason to keep current with updates.
It shouldn't be that way though. Especially the billion dollar corporations should not be excused for shipping insecure software - the sad reality though is that Microsoft seems to have lost most of its QA team and what remains of its dev team gets shifted to developing adware for that sweet sweet "recurring revenue" nectar. Apple doesn't have that problem at least, but their management also has massive problems, prioritizing shiny new gadgets over fixing the tons of bugs people have.
> Security is the main reason to keep current with updates.
For plenty of users, their only exposed attack surface is the web browser and AV codecs. Updates outside of that make no security difference for them.
> For plenty of users, their only exposed attack surface is the web browser
Until they realize that every Microsoft app sends data to mothership.
> Security is the main reason to keep current with updates. They address various “CVE” reports and go beyond to patch things not reported by CVEs.
This does not seems to be the case. Rounding buttons and changing icons size in Teams and Office 365 has nothing to do with security.
> Security is the main reason to keep current with updates
Can you point to some "security" updates ? /s
Not related, but this reminds me of a recent issue with the Samsung 990 Pro SSD that required a firmware update for fix, and some drives had to be returned. I speculate it was exacerbated by increased usage.
https://serverfault.com/questions/1172216/issue-with-samsung...
https://www.tomshardware.com/news/samsung-990-pro-health-dro...
> I speculate it was exacerbated by increased usage.
Then the drive is defective.
[dupe] Earlier: https://news.ycombinator.com/item?id=44931383
So the thing we learn from this is that apparently there's now an SSD version of WinModems[1]. The lesson from the 1990s needs to be re-learned.
I'm wondering if I should defer my full system backup on the 1st of September, as the resulting file is 300+ GB.
I had a BSOD last week, 0x0000012b (FAULTY_HARDWARE_CORRUPTED_PAGE), which I've never had, and was hoping it isn't related to this update.
You might want to run memtest86+ (or the built-in equivalent from some OEMs like Dell), in my experience memory sticks sometimes go bad after being in use for a while.
Maybe just re-tuning the timing, if he's using high performance sticks. Because parts are hard to get by where I live, I usually stick 10+ years with a PC. With usage I found that I have to relax the timings a bit after some years.
Install "Windows 10 IoT Enterprise 2021 LTSC" if you don't mind buying grey market keys. Less crapware, more mature and less enshittified than 11, and security fixes until 2032.
I don't want to endorse Windows at all (use Linux if you can!). But maybe you need it to occasionally test something or whatever.
You don't have to buy grey market keys, use the public ones installed through mass gravel. Open source, hosted on Microsoft's own GitHub - it's practically an endorsement!
> https://github.com/massgravel/Microsoft-Activation-Scripts
Even though I professionally work with Linux I still don't trust it enough for gaming. I know that Steam does great things with Proton, my issue is that I'm not the type of gamer who constantly plays the same game - Play a game for how long the story or my interests lasts, then switch to the next game.
And after a whole day of debugging and hair pulling at work I just don't feel like then also debugging why a game is not running like it should.
But I heard I should give it a try again, last time I gave it a shot was 2-3 years ago. Big plus would be that I'd be completely free of Windows...
Did you try bazzite OS, the only issue I have had was to select the proton build of CS, everything else works out of the box. Except for games that need anti-cheat… So I still ended up with a windows partition.
I guess he wants to use his general computing device as a general computing device and not as a console.
Maybe you know this but Bazzite works perfectly well as a standard Linux desktop operating system. It comes with a non-gaming desktop environment and can be setup to boot directly into that desktop environment. It just defaults to the steam gaming interface.
It's an immutable distribution...
So? Kinoite is my main desktop and I love it. KDE is creating a new Arch-based distro which will also be immutable.
>> Maybe you know this but Bazzite works perfectly well as a standard Linux desktop operating system.
STANDARD - it's not and I hate people that pretend that they are. It's that easy.
FWIW : I own 262 games on my steam library, played most of them at least once. I had no issue with any single game.
I don't play multiplayer games so I'm not concerned by anti cheats though.
huh, good to know. i got a bit less games than you, but 200+ as well, and i also never really play MP, so sounds like i should give it a try :)
I’ve run games on Linux with success but these days do most gaming in a Windows VM with an assigned GPU - and it works very well.
Yeah aggressive anticheat won’t work - but I don’t care much about multiplayer these days, and have consoles to play on if I really want.
> if you don't mind buying grey market keys
Please don't buy "grey market" MS keys (i.e. super cheap keys or keys for products not sold to end users, like LTSC).
Either buy keys from legitimate vendors or use alternative activation methods (emulated KMS, etc.). I believe a lot of these grey market keys come either from MSDN subscriptions or leaked MAK keys, in either case, you aren't really paying for the product, you're just funneling money to sketchy people.
Had to get windows to play anti-cheat games. The EU mandated N versions seem pretty bloat free to me.
Weirdly enough I had one of those 10 IoT Enterprise 2021 LTSC systems kill a SSD in the past month, bad blocks. Intel 520 180GB. Probably coincidence but I figured I'd mention since this was also a system with a large OST file in use.
> Intel 520 180GB
Sorry but this drive is almost 15 years old.
How does one detect a DRAM-less SSD? (From Linux?)
And how does such a thing reserve host RAM?
nvme get-feature /dev/nvme0 -H -f 0x0d
Didn't this patch already get automatically reverted by Windows Update?
Tomorrow, somebody will still explain to you like you're a child that Linux has hardware incompatibilities (on the computer they bought last week the day it came out), and is just not ready for prime time.
They want to stick with Windows because it's safe and just works.
And I will continue to use non-upgradable Macs because, while I miss tinkering with and upgrading my computers, I simply don’t have time for it anymore.
And sadly they’ll still be right :<
If it breaks a SSD, would microsoft be liable for the damage?
The EULA that nobody reads says no
I think you also didn't read the EULA. The EULA says something along the lines:
> the statements incompatible with local law are to be disregarded as void
This is to protect The beneficent of EULA terms (Microsoft) from the possibility that entire EULA is rendered illegal because one of its statements is illegal.
So EULA doesn't say
> no
What it says instead is
> no, if that's legal where you use this software
Though this condition doesn't neighbor the statement like this.
Has Microsoft ever been liable for anything?
I doubt it
I have a strong suspicion this was some kind of stock / market attack. Phison dropped 14% (and their main competitor Silicon Motion increased 7% incidentally), while every single "news / slop" points to a single original source, some random Japanese person called "necoru_cat" that posted a supposed list of affected models (full of spelling mistakes).
I'm actually very surprised a single person managed to pull off a scam of this magnitude and am very worried about what effect fabricated news (now helped by AI) will have in the future.
Nope.
https://youtu.be/TbFIUu_7LIc?si=o1p2FrDYFeLEtIoF
Youtube got bit by this randomly, just working, not looking for this issue.
Any word from Microsoft?
this issue has been going for 2 months
How are you defining "bricked"? The SSD device can no longer be enumerated on the PCIe/SATA bus, or it doesn't respond to ATA/NVMe commands, or it doesn't respond as expected, or it does but the data is always wrong? Does the same SSD work in another machine?
edit: The author of the comment I replied to has changed their comment to remove all details of their testing.
yes
That’s a fragile, sort of roundabout comment. I can think of 90125 reasons closer to the edge that will move us back two squares.