I Found a Bug in Apple’s fsck_hfs — Here’s How I Tracked It Down

11 min read Original article ↗

Kivanc G

TL;DR: fsck_hfs in macOS Sequoia (version hfs-683.x) has a cache exhaustion bug that reports false corruption on large HFS+ volumes. On machines with 8 GB RAM, volumes of 24 TB or larger trigger "Couldn't read node" errors during the extended attributes check. Your data is fine — the bug is in the tool, not the filesystem. Machines with 16 GB+ RAM are unaffected, as are older macOS versions

If you’ve been following my adventures with the HFS+ 24TB volume bug, this is the sequel. In my previous post, I documented a persistent corruption error on my 24TB external HDD. This time, I’m going to show how I traced the error all the way down to its root cause — and it wasn’t what I expected.

Spoiler: the filesystem was fine all along. The bug is in fsck_hfs itself.

The Setup

I have a 24TB external HDD formatted as Journaled HFS+. Every time I run fsck_hfs on my Mac mini M1, it fails with the same error:

** Checking extended attributes file.
Couldn't read node #61432
** The volume 24TB TOSHIBA could not be verified completely.
volume check failed with error 12

Error 12 is ENOMEM — "not enough memory." On a machine with 8 GB of RAM? For a filesystem check? Something didn't add up.

The error was perfectly reproducible. Same node number, every time. Even on a freshly formatted volume with barely any data. That ruled out progressive data corruption and pointed to something deterministic.

Ruling Out Hardware

My first suspicion was hardware — maybe the USB bridge chip in the enclosure was corrupting data, or the drive itself had a defect. But the same error appeared on a completely different 24TB drive with a different enclosure and bridge chip. Two different drives, same error at the same node number. That already pointed away from hardware.

To be thorough, I checked the kernel logs with dmesg and monitored I/O with fs_usage during the fsck run:

sudo fs_usage -w -f diskio fsck_hfs

Zero I/O errors. Clean, linear reads throughout the entire check. The data coming off the disk was fine. Whatever was failing, it wasn’t the hardware.

Reading the On-Disk Structures

If the hardware was clean, maybe the filesystem metadata itself was corrupt. I decided to dump and parse the raw volume header to see what the Attributes B-tree looked like.

The HFS+ volume header lives at byte offset 1024 (sector 2) of the partition:

sudo xxd -l 1024 -s 1024 /dev/rdisk7s2

Parsing this revealed the Attributes file fork data: 512 MiB logical size, a single contiguous extent starting at allocation block 36588, with 16,384 allocation blocks at 32KB each.

Next, I read the B-tree header node — the first 8192 bytes of the Attributes file itself:

sudo xxd -l 512 -s 1198915584 /dev/rdisk7s2

The B-tree header told me everything:

  • Node size: 8192 bytes
  • Total nodes: 65,536 (filling the entire 512 MiB file)
  • Free nodes: 65,341
  • Used nodes: 195
  • Last leaf node: 155
  • Tree depth: 3

So the Attributes B-tree had 65,536 node slots but was only using 195 of them. The active tree lived entirely within the first ~155 nodes. Node #61432 was deep in the unused region.

Does node #61432 even fit in the file? At 8192 bytes per node, node #61432 sits at byte offset 503,250,944 — well within the 536,870,912-byte file. It was in bounds.

Is node #61432 marked as in-use in the bitmap? The B-tree node-usage bitmap is stored as record 2 of the header node. I calculated that node #61432 corresponds to byte 7679, bit 7 (MSB-first) of the bitmap. Reading that region:

sudo xxd -l 512 -s $((1198915584 + 8192 - 512)) /dev/rdisk7s2

All zeros in that area. Node #61432 was correctly marked as free.

What’s on disk at node #61432?

sudo xxd -l 256 -s $((1198915584 + 61432 * 8192)) /dev/rdisk7s2

All zeros. Completely empty, as a free node should be.

Every on-disk structure was valid and internally consistent. The volume header was correct, the B-tree header was correct, the bitmap correctly marked node 61432 as free, and the node itself was properly zeroed. There was nothing wrong with the filesystem.

Finding the Code

Since the data was fine, the bug had to be in fsck_hfs. Apple open-sources HFS+ as part of their Darwin releases, so I cloned the repository:

git clone https://github.com/apple-oss-distributions/hfs.git

A quick grep found the error message:

grep -rn "Couldn.t read node" --include="*.c"

Two hits: SRepair.c and SVerify2.c. The relevant code in SVerify2.c was the function BTCheckUnusedNodes:

int BTCheckUnusedNodes(SGlobPtr GPtr, short fileRefNum, UInt16 *btStat)
{
BTreeControlBlock *btcb = GetBTreeControlBlock(fileRefNum);
unsigned char *bitmap = ...;
unsigned char mask = 0x80;
UInt32 nodeNum;
    for (nodeNum = 0; nodeNum < btcb->totalNodes; ++nodeNum)
{
if ((*bitmap & mask) == 0) // Node is FREE
{
// Read the node to verify it's all zeros
err = btcb->getBlockProc(btcb->fcbPtr, nodeNum,
kGetBlock, &node);
if (err)
{
fsck_print(ctx, LOG_TYPE_INFO,
"Couldn't read node #%u\n", nodeNum);
return err;
}
// ... verify node contents are zero ...
// Release the node
btcb->releaseBlockProc(btcb->fcbPtr, &node,
kReleaseBlock);
}
// Advance bitmap pointer
mask >>= 1;
if (mask == 0) { mask = 0x80; ++bitmap; }
}
}

This function iterates through all 65,536 nodes in the Attributes B-tree. For every node marked as free in the bitmap, it reads the raw node from disk to verify it contains all zeros. With 65,341 free nodes, that’s a lot of reads.

The getBlockProc calls down through GetFileBlockMapFileBlockCCacheReadCacheLookup. And in CacheLookup, I found the smoking gun:

int CacheLookup(Cache_t *cache, uint64_t off, Tag_t **tag)
{
// ... search hash table ...

// Cache miss: allocate a NEW tag from the heap
temp = (Tag_t *)calloc(sizeof(Tag_t), 1);
temp->Offset = off;

// ... insert into hash table ...

// Get a buffer for the tag
if (temp->Buffer == NULL) {
temp->Buffer = CacheAllocBlock(cache);
if (temp->Buffer == NULL) {
// Try to evict
error = LRUEvict(&cache->LRU, (LRUNode_t *)temp);
if (error != EOK) return (error);

temp->Buffer = CacheAllocBlock(cache);
if (temp->Buffer == NULL)
return (ENOMEM); // ERROR 12!
}
}
}

The Cache Exhaustion Bug

Here’s what happens. fsck_hfs pre-allocates a cache at startup — a pool of 32KB blocks used for all disk reads. The size of this pool is determined by available system RAM:

RAM — Cache Size — Cache Blocks

4 GB — 512 MB — 16,384

8 GB — 1 GB — 32,768

16 GB — 2 GB — 65,536

The raw buffer memory is allocated upfront and is sufficient. The problem is what happens to those buffers during the scan. BTCheckUnusedNodes races through tens of thousands of free nodes, and every unique disk offset it touches gets a Tag_t structure allocated via calloc and inserted into the cache's hash table. Each tag claims one 32KB buffer from the pool. When the release path runs, it returns the tag to the LRU list — but the LRU management doesn't keep up with the rate of allocations.

On an 8 GB machine with a 1 GB / 32,768-block cache, the scan eventually reaches a state where every single block in the pool is held by an active tag, and the LRU list is completely empty. CacheAllocBlock returns NULL because the free pool is exhausted, and then LRUEvict fails because there is nothing in the LRU to evict. CacheLookup has no choice but to return ENOMEM.

The “memory” that’s exhausted isn’t system RAM — it’s the internal cache management metadata.

Confirming It With a Debug Build

Theory is nice, but I wanted proof. So I pulled the Apple open-source HFS code and built fsck_hfs myself with added debug output in CacheLookup, CacheAllocBlock, LRUEvict, and BTCheckUnusedNodes.

Running the instrumented binary against the failing volume produced the smoking gun:

LRUEvict(1477):  empty?
ERROR: CacheRead: CacheLookup error 12

The LRU list is empty. At the moment of failure, every cache block in the pool is held by an active tag, and there is nothing the eviction code can reclaim. CacheLookup returns error 12 (ENOMEM), which propagates up through CacheReadGetFileBlockgetBlockProcBTCheckUnusedNodes, which finally prints "Couldn't read node #61432" and exits.

This confirms the mechanism exactly: it’s not that eviction is failing because of clutter — it’s that there is literally nothing to evict. The scan has saturated the cache pool before completing.

The Proof: Testing Across Four Machines

To confirm this theory, I ran the same test on four different Macs. I used the same drive and cable for each test, swapping only the computer:

Machine — macOS — fsck_hfs — Cache — Result

MacBook Air (Intel i5) — 12.7.6 — hfs-583.100.10 — 512 MB — PASS

MacBook Pro (M3) — 15.5 — hfs-683.120.3 — 2 GB — PASS

Mac mini #1 (M1, 8GB)15.5 — hfs-683.120.3 — 1 GB — FAIL

Mac mini #2 (M1, 8GB)15.4 — hfs-683.120.3 — 1 GB — FAIL

The pattern is clear:

  • The MacBook Air runs an older fsck_hfs (hfs-583) that either doesn't have BTCheckUnusedNodes or implements it differently. It passes trivially.
  • The MacBook Pro runs the buggy hfs-683 version, but with 16 GB of RAM, it gets a 2 GB cache (65,536 blocks) — enough headroom to scan all 65,341 free nodes without exhausting the tag metadata.
  • Both Mac minis run hfs-683 with 8 GB of RAM, getting a 1 GB cache (32,768 blocks). This is the exact range where the tag accumulation causes cache exhaustion before the scan completes.

Summary

The filesystem was never corrupt. The fsck_hfs tool in macOS 15.x (Sequoia) has a bug in its BTCheckUnusedNodesfunction: when verifying that unused B-tree nodes are zeroed, it saturates its own block cache pool before the scan can complete. On volumes with large pre-allocated B-trees (like the 512 MiB Attributes file on a 24TB HFS+ volume, which has 65,536 node slots), the cache runs out of evictable blocks on machines with 8 GB of RAM, and the scan aborts with a false "Couldn't read node" error. Confirmed with a custom debug build showing the LRU list empty at the point of failure. The bug is entirely drive-independent — I reproduced it on two different 24TB drives from different manufacturers with different enclosures.

The irony: a function designed to verify filesystem integrity is itself broken — reporting phantom corruption on perfectly valid volumes.

If you’re seeing “Couldn’t read node” errors on large HFS+ volumes with macOS Sequoia on an 8 GB machine, your data is almost certainly fine. The bug is in the checker, not the checked.

The Fix : Bypassing the Cache for Unused Node Checks

Knowing the root cause is one thing, patching it is another. I considered two approaches:

Fix the cache itself. Uncomment the LRUHit call in CacheLookup and adjust the early-return in LRUHit so fresh tags actually get inserted into the LRU list. That's technically the "correct" fix — it addresses the underlying cache bug that leaks tags. But the cache is shared infrastructure used by every part of fsck_hfs: journal replay, catalog scan, extents check, allocation bitmap verification. A wrong fix could subtly break any of them, and without understanding why the LRUHit call was commented out in the first place, I couldn't be confident I wasn't re-introducing whatever problem caused someone to comment it out.

Bypass the cache for this specific function. BTCheckUnusedNodes reads each node exactly once, sequentially, with no reuse. Caching provides zero benefit — every cached block is pure overhead that will never produce a cache hit. The right data structure for this access pattern is a single reusable buffer, not a general-purpose LRU cache.

I went with the bypass approach. The change is entirely local to BTCheckUnusedNodes — no shared code is touched, and the blast radius if something is wrong is limited to the unused-node verification step. The modified function allocates one buffer at the start, reuses it for all 65,000+ reads, and frees it at the end. Each read goes directly to disk, skipping the tag allocation, hash table insertion, and LRU bookkeeping entirely. No cache pollution, no possibility of LRU exhaustion and the scan completes cleanly..

Important Note: this fix is for the specific cache-exhaustion bug described above (error 12 during extended attributes check) . If your fsck_hfs is reporting a different error, or if you’re on a machine where fsck_hfs works fine, you don’t need this. Just so nobody downloads and runs it thinking it’s a general-purpose improvement.

Getting the Patch

Because Apple’s open-source mirror is read-only — PRs and issues are both locked — I couldn’t submit this upstream. Instead, I’ve published the patched source on GitHub:

https://github.com/kivancgnlp/fsck_hfs_cache_issue

The repo contains the modified fsck_hfs source with build instructions. Anyone hitting the same "Couldn't read node" error on a large HFS+ volume can build their own fixed binary from there.