Settings

Theme

Triple-Parity Raid and Beyond (2009)

queue.acm.org

7 points by atoponce 3 years ago · 12 comments

Reader

gymbeaux 3 years ago

I wasn’t building RAID arrays when this paper was written, but I have always heard RAID6 is to be avoided and to just use RAID10 (or RAID5 if you don’t have enough disks). Lately I’ve heard (and agree) the conventional wisdom is to avoid hardware RAID altogether in favor of redundancy solutions at the OS level such as MDADM on Linux or Storage Spaces on Windows (macOS supports software RAID as well, created for their now-dead Fusion Drive system).

I feel like the need for a triple-redundancy option in RAID is superseded by more “advanced” software “RAID” at the file system level such as ZFS or ButterFS (to an extent). Further, the increased availability and affordability of ECC RAM in non-enterprise hardware makes the call for additional redundancy even less-urgent.

There is a nicety to having that backup battery on a RAID card for write-through operations to finish in the event of a power outage, however this is easily-solved by a UPS. In the event of a power outage, not losing any data I might have been transferring to the array is nice, but I’m still losing my OS state and any unsaved things I may have been working on.

  • throw0101b 3 years ago

    > I feel like the need for a triple-redundancy option in RAID is superseded by more “advanced” software “RAID” at the file system level such as ZFS or ButterFS (to an extent).

    ZFS has triple-parity with RAID-Z3:

    > The need for RAID-Z3 arose in the early 2000s as multi-terabyte capacity drives became more common. This increase in capacity—without a corresponding increase in throughput speeds—meant that rebuilding an array due to a failed drive could "easily take weeks or months" to complete.[38] During this time, the older disks in the array will be stressed by the additional workload, which could result in data corruption or drive failure. By increasing parity, RAID-Z3 reduces the chance of data loss by simply increasing redundancy.[40]

    * https://en.wikipedia.org/wiki/ZFS#ZFS's_approach:_RAID-Z_and...

    The use of software versus "hardware" (firmware) does not remove the need for extra copies of data.

  • yieldcrv 3 years ago

    there’s always use cases but I think the interfaces are fast enough

    NVMe SSDs, direct or over PCIe solve lots of the issue that people made RAID arrays for to begin with

    sure in theory you can go even faster and redundant with RAID concepts on these nvme ssds, but tad overkill

    • gymbeaux 3 years ago

      There will always be a need for platter drives (or other magnetic storage) because flash memory loses data over time if it's powered-off. Theoretically you can store a tape or other magnetic drive for decades in a safety deposit box or the like and your data will still be there if/when you need it.

      I don't like the trend of SSDs (NVMe or otherwise) getting cheaper and cheaper because it's coming at the cost of reliability and endurance. Sure I can get 2TB for ~$100 but at this point I'm not convinced it will outlast spinning rust as has been the colloquial assumption since 2.5" SSDs first hit the scene circa 2008.

      I've quickly destroyed (consumer-grade) SSDs before by running stuff that is constantly reading and writing to them. Microsoft's Azure Stack Development Kit (ASDK) is one example.

      Therefore, I'm actually very receptive to RAID'ing SSDs, be it MDADM, ZFS or some other means. I do agree that to RAID (RAID0) NVMe drives is a bit ridiculous, but RAID1 definitely adds value.

      • diggernet 3 years ago

        When SSDs came out I avoided them out of reliability concerns, but figured the endurance would improve over time as the tech matured. But it turns out the reality is that endurance has plummeted:

        SLC: 100K cycles

        MLC: 10K cycles

        TLC: 3K cycles

        QLC: 1K cycles

        https://www.kingston.com/en/blog/pc-performance/difference-b...

        Now SSDs seem to be taking over the laptop world and HDDs are getting harder to find (and smaller, as the larger CMR drives get replaced with garbage SMR models). It seems like reliable storage in a laptop is quickly becoming a thing of the past.

      • wtallis 3 years ago

        > because flash memory loses data over time if it's powered-off.

        That's a red herring; the magnitude of this effect is usually greatly exaggerated, and in reality never comes close to being as important as the fact that flash memory is simply too expensive to use for cold storage.

        And of course, the very idea of cold storage is dangerous: if you really want to ensure your data lasts for decades, you should be verifying your backups at least annually and making plans to migrate your data off any media that is obsolete and at risk of becoming hard to read using commodity hardware. This also entirely eliminates the above flash memory data retention concerns.

        • throw0101b 3 years ago

          > That's a red herring; the magnitude of this effect is usually greatly exaggerated […]

          Per JEDEC, Client SSDs have to retain data for 1 year at 30C, and Enterprise SSDs have to retain data for 3 months at 40C:

          * https://www.jedec.org/sites/default/files/Alvin_Cox%20[Compa...

          I'm sure most last longer, but as a CYA I wouldn't want to rely on that.

          • wtallis 3 years ago

            That's the standard for drives that are at end of life, having exhausted their rated write endurance. Those aren't the drives anyone would use for a cold storage backup system. Drives that have only been written to a few times will retain data much longer.

        • gymbeaux 3 years ago

          I remember reading something somewhere about a guy doing tests on "3D" NAND and other TLC/MLC NAND SSDs and finding that data was lost well before the "10 years or so" that we have come to assume unpowered SSDs can retain data for. I can't locate that article at the moment but it isn't unreasonable to expect faster degradation on denser NAND.

          • wtallis 3 years ago

            The 3D in 3D NAND doesn't belong in scare quotes; the physical arrangement of the memory cells really is three-dimensional and it isn't just a marketing term. The transition from planar NAND to 3D NAND turned back the clock by several years in terms of shrinking memory cell volume and charge differences between adjacent data states.

            The SSDs that caught the most flak for poor retention characteristics were using the last generation of planar NAND, in a three bit per cell configuration, which in hindsight was a bit of an overreach. But even then, the worst symptoms that could be reliably reproduced were poor performance reading back stale data, a result of the drives having to use higher levels of ECC to recover the data. (More recent SSDs have vastly more powerful controllers capable of running LDPC calculations much faster.) That's not quite the same as losing data.

      • throw0101b 3 years ago

        > I do agree that to RAID (RAID0) NVMe drives is a bit ridiculous, but RAID1 definitely adds value.

        RAID0 is handy in HPC for local scratch space.

        • gymbeaux 3 years ago

          True, but Optane would be a better option for scratch, even on an AMD platform. Wouldn't have to worry about drive wear, and... I'm not sure which would have higher IOPS. With PCIe 4, it's possible the RAID0 NVMe drives would have better IOPS figures.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection