Settings

Theme

Casync – A tool for distributing file system images

0pointer.net

138 points by Nekit1234007 9 years ago · 35 comments

Reader

tkfu 9 years ago

I'm not sure I buy the embedded/IoT use case; OSTree is a really good model there and is more featureful. The "well, if your filesystem image delta happens to be in the form of a lot of very small files it's not so great for CDNs" doesn't strike me as a terribly good reason to give up everything OSTree gives you (especially with stuff like the meta-updater [1] Yocto integration).

[1] https://github.com/advancedtelematic/meta-updater

(Full disclosure: I work for Advanced Telematic, the creators and maintainers of the meta-updater Yocto layer.)

  • poettering 9 years ago

    Well, I am pretty sure IoT devices should be designed with security in mind, and that means that they need to be protected against offline modification. And that's something OSTree can't really deliver, but dm-crypt can. And casync works pretty well for delivering dm-crypt enabled disk images.

    I think OSTree is great — but for embedded devices that are installed in the wild, humm, uh, I don't think so? I am pretty sure there are better options than that.

    • ralphmender 9 years ago

      I'm with the open source project Mender.io (OTA for embedded Linux) and we think Casync is a very interesting building block and may look into this and evaluate whether it makes sense to incorporate it into our project.

      We had looked into OSTree before but given the use case of embedded devices in the wild, we concluded it was too risky as OSTree relies on the filesystem to protect from power failures. And rollback was not built-in and is quite challenging to implement reliably.

    • thinkMOAR 9 years ago

      Please elaborate on 'need to be protected against offline modification'?

      • poettering 9 years ago

        Think of cell towers or wind power turbines: they both are primary hacking targets in today's world, and they are placed in the wild, in uncontrolled and unprotected locations. This means more or less anybody can just walk by, temporarily cut the power source, take the harddisk out, plug it into their hacking laptop, install an OS trojan on it, place it back into the original device and restore the power. From the PoV of the cell company or the power company this was just a short power cut, and nothing changed. I reality the system was just hacked. And in order to protect yourself against that OSTree can't help you, because disk accesses aren't validated. The only validation takes place during downloading. dm-verity OTOH will protect every single access, and if deployed properly then such "offline" modifications to the OS will result in the device not booting anymore, which is much preferable over accepting that the device was hacked with no scheme to detect it.

        And it's not just cell towers or wind power turbines: pretty much any device which is around people not unconditionally trusted needs to be protected against such offline modifications. In fact, if people today build cars, TVs, surveillance cameras or anything else like that and do not deploy dm-verity in some form to make sure the devices cannot be modified offline without noticing are just participating in turning IoT into Internet of Shit.

        • thinkMOAR 9 years ago

          But physical access == game over? Whatever software layer you add imho.

          Wouldn't it be easier to simply dunk the whole device in some epoxy preventing access to the hardware with some anti-tamper deadman switch?

          • poettering 9 years ago

            trusted boot and TPMs with remote attestation exist precisely to ensure that physical access does not mean game over. It's all there, people just need to make use of it in their systems. And yes, trusted boot and TPM has issues, but without all this the attack surface is massive, and I think needlessly so.

            • thinkMOAR 9 years ago

              (trusted boot and TPM are afaik already compromised albeit you need to bring a near rocket scientist)

              I will always think physical access is game over whatever 'rocket science' or re-invented old principles people come up with software wise and i'm not sure, but hardware probably too but software is easier to mangle.

              And indeed yes, security is layers, layers that make it more difficult, and having many options for layers to choose from that is great.

              Also didn't hear about OStree before really, reading up on both for some future project.

      • vetinari 9 years ago

        He probably means modification by those who have physical access, which means often the users, but sometimes they are not the owners.

        If you have devices like cable box or water meter, the real owners do not want you to modify the device. That's where mechanisms like dm-verity step in.

dom0 9 years ago

If you read the internals description it could just as well be about Borg, very similar principles here, though the application is very different.

By the way, both buzhash and SHA-256 are kinda poor choices for a new system, especially one that targets servers.

  • rdtsc 9 years ago

    Yap borg is the first thing I thought of. It already does a lot of this an more: encryption, configurable encoding, a rolling hash computed by the Buzhash algorithm and so on.

    Maybe it wasn't geared for CDN delivery during restores but otherwise I've been impressed by borg so far (haven't deployed it in production, only played with it locally though).

    https://github.com/borgbackup/borg

    This is a description of the internal design:

    http://borgbackup.readthedocs.io/en/stable/internals.html

  • dchest 9 years ago

    both buzhash and SHA-256 are kinda poor choices for a new system

    Why?

    • dom0 9 years ago

      Low software performance

      • dchest 9 years ago

        Considering that it uses xz for compression, does the performance of SHA-256 matter? (Well, using faster hash function can speed up finding duplicate blocks, which were already packed.)

        I'm more interested to hear about buzhash, though.

        • dom0 9 years ago

          I assume™ that xz won't stay the only choice. I think it's important to understand that in deduplication, you'll pass all data through your hashes one to two times. Regarding buzhash, it can break with byte granularity, and it has a dependency chain that prohibits parallelization. You'll likely never see it go faster than 700-750 MB/s on a desktop CPU (~3.8 GHz Haswell) and it won't profit from non-clock improvements of CPUs. Giving up byte-granularity allows significant improvements in performance, but I don't think anyone comprehensively analysed the impact on deduplication performance. I didn't.

          (OTOH if your storage is faster than ~200-300 MB/s (buzhash and a hash, naively combined) then there is likely no issue using higher degrees of I/O concurrency, so you can work around these problems).

      • tomfitz 9 years ago

        SHA-256 seems to score well on https://www.cryptopp.com/benchmarks.html .

        • ktta 9 years ago

          Depends on the context. If there is a lot of hashing, then a faster alternative like BLAKE2b is better.

      • poettering 9 years ago

        what would you suggest instead? numbers?

        • dchest 9 years ago

          BLAKE2b mentioned above would be more than twice faster on 64-bit CPUs. But I think we'll soon see SHA-256 CPU instructions on most processors (ARM and lower-cost Intel and latest AMD already ship them -https://neosmart.net/blog/2017/will-amds-ryzen-finally-bring...), so I guess it's not important. For numbers, see blake2.net or bench.cr.yp.to.

          For IoT devices, hashes that work on 32-bit words, like SHA-256, actually make more sense and will be faster, so BLAKE2s would work well.

          What I'd like to hear from the above commenter is about a faster replacement for buzhash, which I'm also interested in.

tomfitz 9 years ago

Great. The chunked model (inspired by Borgbackup/Tarsnap) seems preferable to Docker layering, and diff-based approaches.

As far as I can tell, the advantages compared to Borgbackup seem to be:

* casync offers control over which FS metadata is included

* casync, the server, exposes chunks over HTTP

* casync, the library, is written in C so is more easily used by systems software.

I'm betting we'll see machinectl integration. Excellent!

  • tomfitz 9 years ago

    Oops. Just realised my comment contains a mistake.

    casync does not act as a server. Its on-disk representation and client behaviour is designed in such a way that the server need only serve static files. This makes deployment easy.

  • purpleidea 9 years ago

    > I'm betting we'll see machinectl integration. Excellent!

    systemd-nspawn integration is what I was thinking about too, so yeah! Nice work.

RachelF 9 years ago

All these great Unix based tools make me wish I did not have to work on Windows Servers all day.

the_arun 9 years ago

What is the difference between Casync & rclone (https://rclone.org)?

  • striking 9 years ago

    rclone is a nice cloud cloning solution for files and folders. casync is intended to clone and delta entire filesystems, in a way that makes them nicely deployable. rclone is very cloud focused while casync says nothing of the details of how images are served.

    casync also has fs composition, a multitude of recorded file attributes, automatic reflinking/hardlinking, uid/gid shifting, and so much more.

    tl;dr: rclone is for files, casync is for entire filesystems/deployments.

peterwwillis 9 years ago

I think this is a very useful tool if working with arbitrary file changes on block devices, but it's still very low level and would need a crapton of modification/wrapping to make it useful in a complex system. I would rather use Kickstart or the like to distribute changes intelligently, or barring that, rsyncing hardlinked directory trees, or zsync (but RPM/Yum would really be ideal due to the features gained)

JustinGarrison 9 years ago

I'd be really interested in creating full disk images and then cloning to another disk. If that is a use case (I think I skimmed the post correctly) it could be very useful and performant over dd and similar disk cloning tools.

nwmcsween 9 years ago

It sounds like a sort of half done distributed filesystem, a lot of similarities.

abhineet97 9 years ago

This sounds like a centralized BitTorrent.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection