PSA: Validate Your Storage! JMicron JMS583 + KIOXIA BG4 Series SSD Issue?

7 min read Original article ↗

After recently purchasing a refurbished Dell Latitude 5420 and upgrading its internals, it liberated a 256GB KIOXIA BG4-series SSD (KBG40ZNS256G) 2230-size M.2 NVMe SSD that’s somewhat worn but still very serviceable. Rather than let it go to waste, I threw it into a spare Jeyi ThunderRate compact USB3.2 10Gbit/s enclosure to use it as an external drive. Before pressing it into service, let me quickly check that it’s working correctly.

Trouble in Paradise

Using the rather-useful H2testW, I commenced a full write and verify on my daily driver Lenovo Legion 5 17ITH6 based on an Intel i7 11th-Generation CPU.

Uh-oh. Something is wrong. The drive was nearly fully written, but reading it back, there is 24MiB of data that is lost. Scrolling down, the data was zeroed, rather than the intended pattern. I re-ran verification to check …

… the result was identical. So it’s clear to me that the issue seems to have occurred during the write phase and not the read phase. I’ve had issues with USB 3.x cables causing instability with some enclosures, so I went and got some other trusted cables.

Since this is a bit of a worn SSD, I decided to do another few writes and verifies to see what happens. The amount of corrupted data has changed, so this seems to be an issue that is not consistent and is somewhow “conditional”. Changing the cables didn’t fix the issue at all.

So I decided to change over to a USB 2.0 cable, no more USB 3.0 speeds. This seems to have worked and reading the data back again on USB 3.x results in a successful verification so changing back to USB 2.0 seems to have fixed it. I suspect either the slower writes or the loss of UASP is what “fixed” the corruption – is this a symptom of inadequate power? The SSD needs just 3.6W, so a USB 3.x port should be plenty.

Instead of a USB-C port, I tried a USB-A to USB-C cable to see if it was just a bad port. Alas, no luck either – in fact, the drive even dropped out in writing with one of the cables, indicating the existence of a flaky cable or plug. I even swapped the SSD into a second Jeyi ThunderRate enclosure and the issues seemed to persist.

I suspected something wrong with the JMicron chipset or the SSD itself, so I moved the SSD into a Realtek-based Orico enclosure and H2testW passed just fine. Just to make sure it wasn’t a fluke, I tried it a second time and it passed a second time. So this vindicates the laptop, cable and SSD as not being entirely at fault.

An Attempted Modification

Grasping at straws, since the JMicron JMS583 seems to work just fine with some other SSDs, I suspected that perhaps there’s a power issue. I opened up the Jeyi ThunderRate enclosure and decided to go swapping in oversized capacitors.

The capacitors in question are all 22uF on the board, as measured by an LCR meter, so I swapped in 100uF capacitors in a larger 0805 package which made it hard to fit. Note that C39 was later changed after the photo was taken as well.

This change was in hopes that if the issue was with power, the added capacitance would mean the SSD gets cleaner power and the JMicron controller might be less-affected by the SSDs current peaks.

This didn’t fix it in USB 3.x mode, but at least it didn’t break it in any way and USB 2.0 connectivity continues to produce reliable results. Verification went quicker probably due to better thermal pad positioning.

Regardless, the drive didn’t report any issues internally. This corruption is truly silent – if the drive was in trouble, I would expect 0x0E and 0x0F to show some non-zero value.

How About Firmware?

Knowing that JMicron solutions can sometimes be flaky especially with older firmware, I went looking for firmware from StationDrivers.

I tried with the packages I felt was the most recent, but it didn’t seem to work. But then I came across one which seemed to work and did indeed update the firmware version to something more recent –

Resetting the enclosure, at least it didn’t kill it. It was running … but did it fix this issue?

Nope. It didn’t fix it at all. So now I was thinking whether it was the SSD firmware. Looking for the part number, the only firmware I found was the one it was already running – 10410106.

Trying Other Things

I’ve had instances in the past where repartitioning a drive helped, so I decided to leave some empty space at the end. No dice.

I swapped the SSD into a generic USB 3.x case that has the same JMS583 chipset and it didn’t work properly either.

I tried different platforms – the Dell Latitude 5420 (Intel 11th Generation i5), Intel 13th-generation laptop, N150 mini-PC, a Lenovo Legion Slim 5 (AMD Ryzen 7 7xxx-series), AMD Ryzen 7 5700x workstation and even a Windows 11 VM running on my Lenovo Legion 5 laptop. In all cases, data corruption of varying amounts occurred.

Only my older AMD Phenom II x6 1090T BE desktop running Windows 7 on an NEC USB 3.0 5Gbit/s controller managed to do USB 3 rates and not corrupt the data. Thus suggests to me that UASP and/or 10Gbit/s USB 3.x is part of the problem.

What About Another OS?

Is this an issue with Microsoft Windows and its UASP driver? Let’s give Ubuntu a try. Installing Wine so I can run H2testw …

… it seems that the correct detection of free space isn’t quite right but at least the write was mostly completed.

Verification of the written data was successful – so that was surprising.

The dmesg output suggests that UASP was in use, so perhaps the Linux UASP driver behaves differently and doesn’t provoke this issue. That’s an interesting find.

Conclusion

While I had no trouble when reviewing the Jeyi ThunderRate and Generic JMS583 external enclosures, placing this SSD into either enclosure seems to show that there is a compatibility issue that results in a random amount of sectors during the write process when connected via USB 3.x UASP on Windows. Connecting in USB 2.0 or to an older Windows 7 machine without UASP support at USB 3.0 rates worked just fine. Reads seemed unaffected and this behaviour doesn’t seem to affect other tested SSDs, just this KIOXIA BG4-series drive. Linux also seems to escape unscathed.

Is this a JMicron issue, or a KIOXIA issue, or a Microsoft issue, or a mixture of the above? I can’t be sure, but the issue doesn’t affect the Realtek-based Orico enclosure I tested with nor does it affect the SSD when directly connected to a PCIe port on a laptop or desktop. Is it to do with the behaviour of the SSD itself, or is it something more mundane like a subtle bug if the drive capacity is a certain value or if a certain number of queued I/O events exist.

Let this be a lesson for all – always validate your storage before you trust it to your precious data. While it’s easy to slap an SSD into an enclosure and it would seem to work just fine, wouldn’t it be a shame if there are random “holes” blasted into your files? Imagine taking a back-up of your computer only to find it was damaged when you need to restore from it. If you take the time to validate its integrity, you should be able to catch some issues before they cause damage, although with external storage, it might also pay to validate against any platform you intend to use the drive on due to differences in USB controller behaviour and power supply.