How fast SSDs slow to a crawl: thermal throttling (2021)
eclecticlight.coI did not see any temperature measurements of the controller though in this article so the test doesn't feel very scientific to me.
It's not just thermal throttling of the controller that causes slowdown, it's also the filling of the DRAM/SLC cache.
Also, we should talk more about some vendors screwing over customers by replacing controllers or NAND chips with slower parts to cut down cost while keeping the same SSD SKU, after seeding the original SKUs to reviewers to lock in good benchmark scores in online tests. This is why I recommend only buying SSDs from reputable vendors/OEMs who are more vertically integrated: Samsung, WD, Sandisk, Micron.
The tested SSD indeed has an SLC cache which likely filled and the author misinterpreted the cause of the throttling as thermal.
The only meaningful way to prove thermal throttling (in any part) is to cool it down and see if that corrects the problem. This author did not do that. Correlation is not causation.
And cooling down in this case has to be accomplished by something other than letting the drive sit idle for a long time, because that same idle time can be used by the drive to empty the cache.
Letting it cool down and cooling it down are different things. You have to run it at load, let it get hot, then cool it down without removing the load. Do nothing else but spray some liquid N2 at it and watch to see if it speeds up.
> You have to run it at load, let it get hot, then cool it down without removing the load
Sustained load testing can reveal multiple phase changes in performance unrelated to temperature, which can complicate results from a single run (eg. what if you run out of spare blocks before the drive has cooled below the hysteresis threshold to disable throttling). So multiple independent runs starting from the drive in the same state and varying only the cooling method is the most controlled and reliable methodology.
That's it I would say.
Besides what you've mentioned, overprovisioning/garbage collection may also cause slowdowns after sustained writes.
The more I've worked with SSDs to try to get them to perform well, the more I've come to realize benchmarking SSD performance is virtually impossible to do in any meaningful sense, because they themselves are stateful and your performance is highly dependent not only on what you are doing now, but what you did a few moments ago. There's also a bunch of protocol-level behavior adding to this, as well as os-level behavior that may be difficult to isolate, and even if you succeed, yeah, you're benchmarking the device in an unrealistic fashion unlike any actual real world usecases so congrats I guess.
Typically there are a few primary modes of the state. It’s useful to understand the performance under each mode. Even if you want predict exactly what mixture of modes you’ll be in, you can have a good guess if you’ll be in some pathological behavior based on your workload.
Are there any programs that can track temperature for different vendors? I'd love to run a few ssds through a benchmark and have hard data for thermal data.
Temperature data is part of SAMRT so there's a plethora of utilities on any OS. For windows the best is HWInfo, IMHO.
Ideally, for a proper test, I'd expect thermocouples placed on the controller and not just rely on the sensor data provided by the SSD as that could be very misleading depending on where the senor is and how the raw sensor data is processed.
arg, you are right, smart would be the easiest way to go. And the thermocouples are a great idea. I'll see what I can whip up
sudo nvme smart-log -H /dev/nvme0
Will show all temperature sensors on the device, there are usually several. It also will show how often the composite temperature crossed the warning and critical thresholds in the lifetime of the device, and how long the device spent above those thresholds.
You can read the temperature from SMART interface.
Only if the interface exposes it. USB ones in my experience outright don't support it or mess it up somewhere in the chain; the only place I got SMART reliably to work was with a direct SATA attachment to a controller.
If you have a desktop PC and want to make the process of connecting external drives to SATA for this purpose a bit easier, you can add an eSATAp bracket to your PC.
You can also slap a thermal sensor to the SSD. Or use contactless thermometer.
openhardwareMonitor https://openhardwaremonitor.org/ I'm windows
Thermal cameras are surprisingly cheap now
Samsung was one of the vendors doing this... there was some controversy about them changing the specs on the datasheet without changing the SKU. This was for Samsung-branded drives, one of the 9xx models iirc.
Samsung did update their SKUs. Link: https://www.tomshardware.com/news/samsung-is-swapping-ssd-pa...
I'd advise against throwing accusations without proof. Looking for reliable hardware vendors is hard enough, there is no need for people unintentionally muddying the waters.
The marketing name still seems to be the same according to the article, so it would still fail any reasonable consumer confusion test.
Have you checked the performance difference or are you just venting based on click-bait articles?
According to benchmarks, the difference in performance is negligible, in fact the new SKU is sometimes better than the old one in most benchmarks, so there's really nothing to worry about.
You're grasping at straws here.
Doesn't sound negligible to me:
> In longer tests, both drives decrease sharply in performance as cache fills, which is expected. But while the older drive retains nearly two-thirds of its original performance, the newer version craters to less than a third. We can see this effect not only in artificial benchmarks, but also in large file copies [1]
While changing SKU is one step better than other manufacturers, it's still a shaddy behavior keeping the old name.
[1] https://arstechnica.com/gadgets/2021/08/samsung-seemingly-ca...
This was the case with ADATA.
They had a lot of SKUs (6) for one part, and despite on paper some of the numbers being lower, and in synthetic benchmarks the drives performing differently, some of the “slower” parts actually performed better than the original reviewed/released samples.
This doesn’t excuse what some people consider bait and switch. But it’s more complicated than many outraged gamers made it seem.
Then they shouldn't have anything to fear from being honest about the situation, releasing it as the "second edition" or something, and letting it go through the regular review pipeline like any other new product or revision.
Considering how WD did this with switching CMR drives to SMR I wouldn't trust them with SSDs either.
ADATA caused some of this by having over 6 builds for the same SKU with different performance metrics.
Their latest SSD (I think), S70, has a marketing bullet point that all S70 parts are built with the same components. Should be unnecessary, but here we are.
I’ve basically never bought anything but the Samsung Pro drives and never once been disappointed.
I have observed thermal throttling on a plastic Sandisk SSD poorly located in my case. It has nothing to do with usage and is 100% a thermal issue.
It is strange for it to attributed to thermal throttling when there are no measurements taken and not even a simple correlative graph to demonstrate the relationship. I'm sure it definitely could be the case though, just a comment on the strange assumption the article seems to take.
With no research what-so-ever I would have attributed the slow down to some kind of cache or buffer filling up.
Glad to see attention brought to this issue. I recently started using these:
ASUS Hyper M.2 X16 PCIe 4.0 X4 Expansion Card Supports 4 NVMe M.2 (2242/2260/2280/22110) up to 256Gbps for AMD 3rd Ryzen sTRX40, AM4 Socket and Intel VROC NVMe Raid https://www.amazon.com/dp/B084HMHGSP/ref=cm_sw_r_cp_api_i_ZY...
If you have a spare x16 slot I highly recommend. I’ve torture tested the latest and greatest hot screaming NVMe drives with it and between the massive heat sink and fan I’ve never seen NVMe temps rise above 40c.
Supporting more than one NVMe is tricky, though. You need to make sure your motherboard supports PCIe bifurcation. Common in server motherboards and some recent high end consumer motherboards but virtually unsupported with everything else. That said if you’re experiencing NVMe throttling due to temperature it’s worthwhile for even one drive.
Having no need for 4 expansion slots, I got this last year:
SilverStone Technology M.2 PCIE Adapter for SATA or PCIE NVMe SSD https://www.amazon.com/gp/product/B075ZNWS9Y/
Its worked like a charm. I've only put a drive into the M.2 slot so far.
Is there any real benefit over some thermal pads and heat sinks? I've passively cooled CPUs to idle at 40 degrees and max out at 60 degrees during benchmarking. I imagine $10 worth of pads and heatsinks would achieve the same outcome as this expansion card.
In my use case I needed to support four x4 NVMe drives (actually eight NVMe across two x16 slots). The low temperatures in a very dense chassis by doing nothing other than mounting the drives is a nice bonus!
That makes a lot more sense. Thank you for expanding. Glad to hear the setup is working for you well :-)
Most SSDs have temperature thresholds at 85C or higher. Why do you care if yours maintains 40C?
There are reasons to assume that data retention might drop exponentially as temperature increases. So if you care about the data, you'll want to avoid overheating your storage media.
Power-off storage temperature is a problem for MLC NAND, but when powered on the NAND cells need to reach a high temperature to operate properly. If you are cooling the flash chips with a heatsink (rather than cooling the controller) you will be forcing the device to dump power into the cells to heat them to a temperature where they work properly.
3D NAND hits its best program time and raw bit error rate at about 70C.
Edit: See data table on page 27. Retention is directly proportional to device active temperature, i.e. higher cell temperature during programming leads to higher retention. https://www.jedec.org/sites/default/files/Alvin_Cox%20%5bCom...
That might be one of the most counterintuitive durability fact I've heard in a long time. Thanks a lot for sharing.
I noted in another comment I needed eight x4 NVMe across two x16 slots in a very dense chassis. I was pleasantly surprised at the temperatures.
The hardware is in a colocation facility and I like to have plenty of buffer with standard operating temperatures in case of a cooling failure, etc. Is maintaining 40C necessary under normal conditions? Nope, but it's definitely a nice to have regardless.
Just bought one on your recommendation, we'll see how it works out.
This is well known in the industry. Just under 10 years ago, I was on a support team for 'Enterprise flash', which meant PCIe cards stuffed with NAND flash. The card sizes were 500MB to 2.2TB.
One of our tasks was to help qualify supported systems, which went down to approving SOME chassis IF the card was in a particular slot with a specific airflow. In some cases, it was required that internal ribbon cables were re-routed to improve airflow. The flash cards would throttle progressively at set temperatures, eventually going read-only and offline to protect the contents.
The issue of temperature and thermal throttling carried-on into the 'consumer' HDD replacement market. I can recall attending an online tech briefing on SSDs where I put a comment in the chat that one issue not being covered was device temperature. When this was put to the panel of 'experts', they were a bit bemused, commenting that SSDs don't get hot because they consume less power than HDDs. Environmental conditons were not even considered.
Truth is that, with a bit of averaging, the power consumption of a modern 2TB HDD is about the same as a 2TB SSD: around 2-5W. Both devices generate heat and both devices are often in a warm environment.
2.5" SSDs have the advantage of being surrounded by a heatsink (at least all SSDs I've seen had a (semi)metallic case and thermal pads), while m.2 and NVMe often do not.
So a lot of them fail from overheating - mostly cheaper models/makes. But even the cheapest Samsungs (that their own controllers) seem to fare better than cheap brands like ADATA.
Actually, this has been a problem for a lot of controllers - RAID, SATA, USB3.x, networking cards that fail due to the manufacturer using subpar cooling - usually a small heatsink that they've deemed to be good enough under "normal usage" (i.e. not heavy, sustained usage), or they rely on server cooling to do the job (which actually makes sense).
I have seen SSDs on Windows at least lose a lot of performance due to file fragmentation. While there is no particular reason why the SSD would run slow once you try to read a file from the filesystem it does slow down and it can impact performance dramatically. Dropping drive performance to 1/5 of normal after 10x of overwrites of the drive contents.
The dogma at the moment is that SSDs don't require de-fragmentation and that is potentially true to a certain point but I think Windows actually needs the file system de-fragmented due to its overhead. I have a program to reproduce the effect and have been meaning to test EXT4 and write an article about it at some point. I need to check its something that happens across a range of devices before I publish and it really is just windows, I know defragging the files (copy away, delete files and replace) works to instantly fix performance but it could be device/controller/firmware specific.
The other possibility is large amounts of writes filling the device can result in reduced working space especially in drives with very small amounts of cache that cause slow downs near the end of tests.
The effect of fragmentation can be estimated from the standard I/O tests-- just compare "sequential read" to "64KB random read".
> The dogma at the moment is that SSDs don't require de-fragmentation
They don't require a regular de-fragmentation, like HDDs, because if you are just occasionally read some files it would be fast enough AND with a physical layer hidden by remap it doesn't make sense at all, because the file what is logically present to you as a one continuous block could be really stored across multiple locations.[0]
> I know defragging the files (copy away, delete files and replace)
And this is one the real way to "defrag" on SSD backed media. Tossing around clusters like it is a HDD only wastes your TBW.[1]
> While there is no particular reason why the SSD would run slow once you try to read a file from the filesystem it does slow down and it can impact performance dramatically
THere is always a couple of factors what affect the performance.
Where is always a question what exactly you are reading: a bazillion of < 1KB files could be anywhere on the physical storage and while the time to access for a single file could be as fast as SSD can provide, the pattern of accessing thousands of files of small files not only fills the IO queue, but wastes tons of time on overhead, for every file you access there is not only "Hey SSD grab bytes at LBA 44444 to 5555", but also there are a before mentioned queue for IO operations, parsing MFT for the file location at LBA, reading and parsing DACL, allocating handles (and discarding them later) etc, etc. And if you run out of caches (most notably the DRAM cache on your SSD) then of course things starts to slow down to a crawl, especially if you not only reading those files, but do other things on the same drive at the same time.
Also while I mention MFT - some small files are stored in it entirely[2], so all the overhead is processed quickly (because in normal conditions most of the MFT is cached in memory anyway) but it should be small enough[3].
Also don't forget what if your file is 1KB the drive doesn't read 1KB from the storage. At best it reads 4KB (the default NTFS cluster size), but if your next file isn't in this block (or it is but by the time it comes to read it the cache of this block was already flushed) then you need to wait until the previous read completes. Yes, reads are fast, in theory, but again this is where IO queue, caches, NCQ starts do matter.
And last but not least: on Windows there is always a question if the antivirus software (be it built-in Defender or a 3rd-party one) is still sane or wastes your time rechecking all your already checked, static, non-executable files. Like a bazillion of jsons.
[0] and without TRIM support you can't even have even a very loose guarantee what you really cleared the block.
[1] back in the day I used this to defrag a very heavy fragmented HDDs, just Ghost it to another drive and then restore it back - all files are defraged and it takes way less time because source drive only reads, not read-write-repeat.
[2] https://superuser.com/questions/1185461/maximum-size-of-file...
[3] just checked a couple random files on my drive - cutoff is somewhere around ~700B.
With all the respect to the author, any experiment should be reproduced. More than 1 disk should be tested, the temperature should be measured, and a comparison to the externally cooled disk would be looking great here. Without that, we just have some assumptions, that are partially (or fully) correct (or incorrect).
And the test should include something more substantial than
> a series of 96 files ranging in size from 2 MB to 2 GB, fixed sizes but in a randomised order. The test completed in a total write time of 24.4 seconds
2MB to 2GB was a good test somewhere in the early '00, not today, when even your average not-very-AAA game requires 50GB for the install alone.
Edit: and the first two graphs uses 10^8 as the base while the last one uses 10^10 for bytes, while the cumulative total says it was only 3 GB total data written.
This is not a test, it is just a bunch of loosely related data based on one "test".
Jay has done a test regarding ssd overheating: https://youtu.be/dDTH93dgulE
Verdict: once you have any kind of passive cooler on one it won’t overheat. But he has only tested it with one fast PCIe 4 x 4 drive.
Or ensuring airflow directly on the thin heatspreader already present. For my personal PC I just have a small fan inside the case perched on the backplate of the GPU blowing air directly at the NVMe slot. Very ghetto, but very effective :D
The irony is that flash write endurance for modern flash chips is actually greater if the writes are at higher temperature. Charge mobility is greater, and the writing damages the cells less.
Unpowered persistence is the opposite, however. At 85 Celsius, the data may last only single digit days unpowered by at freezing temperatures could last hundreds of years unpowered. For the same reason, I believe: charge mobility is lower at lower temperatures in semiconductors.
I wonder if this is what permanently effected a 256GB Adata SATA SSD I had in my Linux box. I was running updates and noticed it was very slow once it started unpacking everything. At first I let it go and went about my business surfing the web which wasn't hitting the disk. Finished updating, decided to reboot as new kernel was pulled and the issue was now obvious as it was dragging its feet booting. Worried, I ran a backup of my home dir upon login and every file transfer topped out at 4 or 5 MB/sec. SMART was enabled but reported nothing.
I wound up dumping the ssd to ensure I had a full image and that entire operation topped out at 5 MB/s. And I had the SSD out of the case in a cradle when ripping so it wasn't getting cooked in the case. I did not notice it feeling abnormally warm or hot. I later tried to mess with the disk in a cradle but as soon as I powered it on it topped out at 4-5 MB/sec so either there is some sort of defect causing an immediate thermal issue or something in the controller went awry.
5MB/s even for sequential transfers sounds too severe for thermal throttling, unless a temperature sensor failed in just the right way to convince the drive to activate its last-ditch throttling mechanism before simply shutting down.
More likely, you were experiencing repeated ECC failures and read retries on every access. Since you ruled out a bad cable by also running it in a dock, I'm guessing you had a premature failure of a large chunk of flash, possibly an entire die (though a lot of drives especially at lower capacities don't have enough over-provisioning to do erasure coding to protect against a full die failure).
It sounds like a regular day.
Probably combination of:
- Constructed with fewest number of nand chips possible, and using bottom of the barrel shit like 64L Intel QLC
- Easily overheating DRAMless controller (eg. SM225*XT hitting 70+C at the slightest load)
- Tucked in a heat trap plastic case
- pSLC cache is always almost-full (because for some reason entry level controllers always choose to do that), which is quickly filled after a not-so-long run of Windows update
Combine all these in a single product (like ADATA often does in its entry level SATA), and 5 MB/s sounds actually fast :)
It had never even occurred to me that my storage would want cooling, until I swapped in a new motherboard, moving from atx to mini-itx, and I noticed that the nvme.2 socket was covered by a heat sink. Performance is literally 50% better - and it still gets toasty.
>It had never even occurred to me that my storage
Technically, the storage medium itself doesn't require cooling (the NAND flash chips). The issue is NAND does not exist in a vacuum on its own, as far as computer storage goes, but the SSD needs a controller to manage the data I/O from the PCI-Express bus across all the NAND chips and also caching to DRAM/SLC in between while doing various integrity, error checks and trimming in the background, so those controllers need to be very powerful for the insane speeds the PCI-E bus is capable of, being usually multi core ARM chips running a real-time OS with complex algorithms, so of course they run hot when you push them hard.
Hard disks also had controllers to manage the transfer of data between IDE/SATA to the DRAM cache and then finally the spinning rust, but since the speeds were s much slower, those controllers didn't run as hot. However, some HDDs still benefited from cooling as they had a powerful motor spinning at over 10K RPM which made the drives hot.
In either case, there's no free lunch, when you push lots of current through electronics chasing extreme performance, you get lots of heat that needs to get dissipated somehow, simple as that.
Isn't thermal throttling good? Without that safety measure you'll break your PC.
I don't care if climate change kills millions of people. It impacts my summer gaming sessions. I need AC!
Yes, but you would expect that the product was designed that this downt occur in "regular" conditions. Unfortunately many products unavoidably throttle themselves. This is likely because they run cool long enough to get good reviews and benchmarks so longer performance doesn't really matter to them.
Generally it means something isn't working right, and your performance under the throttling can be quite bad. Sometimes a small undervolt is enough to avoid the issue, increasing overall performance.
I have experienced this with using m.2 SSDs in an external enclosure. I found they all throttled until I bought one that had a fan. Never throttled again.