Chunking Attacks on File Backup Services Using Content-Defined Chunking [pdf]

daemonology.net

123 points by cperciva a month ago


pvg - a month ago

Less pdfblobby blog post https://www.daemonology.net/blog/2025-03-21-Chunking-attacks...

dchest - a month ago

This is great, thank you! This was on my wishlist for a few years:

https://www.reddit.com/r/crypto/comments/7imejm/monthly_cryp...

I've tried to take a stab at this problem, but was not sure if it worked at all:

https://gist.github.com/dchest/50d52015939a5772497815dcd33a7...

It's a modified BuzHash with the following changes:

- Substitution table is pseudorandomly permuted (NB: like Borg).

- Initial 32-bit state is derived from key.

- Window size slightly varies depending on key (by ~1/4).

- Digest is scrambled with a 32-bit block cipher.

I also proposed adding (unspecified) padding before encrypting chunks to further complicate discovering their plaintext lengths. Glad to see I was on the right track :)

mananaysiempre - a month ago

> I'm also exploring possibilities for making the chunking provably secure.

Seems like that’s possible[1] to do in a fairly straightforward manner, the question is if you can do this without computing a PRF for each byte.

[1] Obviously you’re always going to leak the total data size and the approximate size of new data per each transfer.

masfuerte - a month ago

Could this be mitigated by randomising the block upload order?

A fresh backup will be uploading thousands of blocks. You don't want to create all the blocks before uploading, but a buffer of a hundred might be enough?

0cf8612b2e1e - a month ago

Having not read the paper, does this impact Restic or Borg which encrypt chunks?

amarshall - a month ago

My reading is that the primary vector is based on the size of the chunks (due to deterministic chunking and length-preserving encryption). Would padding chunks with random-length data (prior to encryption) help mitigate this at the cost of additional storage (and complexity)?

- a month ago
[deleted]
jszymborski - a month ago

Would SipHash be too slow? I think it would help mitigate the problem since you can key it to prevent known-plaintext attacks, right?

EDIT: or maybe this keyed rolling hash https://crypto.stackexchange.com/questions/16082/cryptograph...

ThomasWaldmann - a month ago

borg discussion + wiki page:

https://github.com/borgbackup/borg/discussions/8694

pbsd - a month ago

In page 10, should the ring R be GF(2)[X]/(X^32-1) and the map p be from {0,1}^{32} to R?

Pozzi1 - a month ago

[dead]

hackburg - a month ago

[dead]