Chunking Attacks on File Backup Services Using Content-Defined Chunking [pdf]

daemonology.net

123 points by cperciva 2 months ago


pvg - 2 months ago

Less pdfblobby blog post https://www.daemonology.net/blog/2025-03-21-Chunking-attacks...

dchest - 2 months ago

This is great, thank you! This was on my wishlist for a few years:

https://www.reddit.com/r/crypto/comments/7imejm/monthly_cryp...

I've tried to take a stab at this problem, but was not sure if it worked at all:

https://gist.github.com/dchest/50d52015939a5772497815dcd33a7...

It's a modified BuzHash with the following changes:

- Substitution table is pseudorandomly permuted (NB: like Borg).

- Initial 32-bit state is derived from key.

- Window size slightly varies depending on key (by ~1/4).

- Digest is scrambled with a 32-bit block cipher.

I also proposed adding (unspecified) padding before encrypting chunks to further complicate discovering their plaintext lengths. Glad to see I was on the right track :)

mananaysiempre - 2 months ago

> I'm also exploring possibilities for making the chunking provably secure.

Seems like that’s possible[1] to do in a fairly straightforward manner, the question is if you can do this without computing a PRF for each byte.

[1] Obviously you’re always going to leak the total data size and the approximate size of new data per each transfer.

masfuerte - 2 months ago

Could this be mitigated by randomising the block upload order?

A fresh backup will be uploading thousands of blocks. You don't want to create all the blocks before uploading, but a buffer of a hundred might be enough?

0cf8612b2e1e - 2 months ago

Having not read the paper, does this impact Restic or Borg which encrypt chunks?

amarshall - 2 months ago

My reading is that the primary vector is based on the size of the chunks (due to deterministic chunking and length-preserving encryption). Would padding chunks with random-length data (prior to encryption) help mitigate this at the cost of additional storage (and complexity)?

- 2 months ago
[deleted]
jszymborski - 2 months ago

Would SipHash be too slow? I think it would help mitigate the problem since you can key it to prevent known-plaintext attacks, right?

EDIT: or maybe this keyed rolling hash https://crypto.stackexchange.com/questions/16082/cryptograph...

ThomasWaldmann - 2 months ago

borg discussion + wiki page:

https://github.com/borgbackup/borg/discussions/8694

pbsd - 2 months ago

In page 10, should the ring R be GF(2)[X]/(X^32-1) and the map p be from {0,1}^{32} to R?

Pozzi1 - 2 months ago

[dead]

hackburg - 2 months ago

[dead]