Ask HN: Will we ever be able to recover BitTorrent files from the piece hashes?
sha1 hashes are not secure, and collisions can be practically produced by attackers.
I'm wondering if these attack strategies could one day be used to find collisions for the pieces in a bittorrent archive to effectively "recover" the data that was originally used to create the bittorrent archive.
Let's say the piece size is 4 KiB, that means there are 2^(8x4x1024) possible combinations of bytes for each piece. That is a number with almost 10 thousand digits. Could these sha1 collision attack strategies be used to cut down on the number of combinations we'd have to comb through to find the matching 4 KiB of binary that original produced the hash?
Either way, I'm interested to learn more about how this works. Thanks. The problem is that a hash is not even close to a one-to-one mapping of input data <-> output hash. There are a truly enormous number of possible 4 KiB strings that can produce a given hash. Assuming SHA-1 is evenly distributed, there are 2^(8*4*1024) / 2^160 possibilities (which is 2^32608). You could find some 4 KiB piece that works, but it would almost certainly not be the original. Even if your torrent had a single file with a single missing 4 KiB piece, and even if you had a machine that could spit out every configuration that would pass the hash check, you'd still have 2^32608 theoretically valid files to test. (Note: I'm not taking the time to think the specifics through so my math may be incorrect, but the conclusion that the search space would remain huge is sound.) If you put enough monkeys on type writers, they'll create Ubuntu 18.04 LTS. What you're talking about is a preimage not a collision which is a different kind of attack. Thanks for the correction. It's probably easier with quantum computers: https://eprint.iacr.org/2020/213.pdf. How would that appreciably help? There are so many more 4kb strings than sha1 hash values that finding collisions barely matters. Data generated by people isn't random, it has statistical regularities that can be used to narrow down the search space and since quantum computers are better at certain kinds of search problems it will be easier to find fragments of files based on just the hashes of the content. As in hashing being there ultimate compression… seems unlikely