Ptar: Replacing .tgz for petabyte-scale S3 archives

plakar.io

57 points by vcoisne 23 days ago


tux1968 - 22 days ago

They mention in the article that some people don't want to install the full Plakar backup software just to read and write ptar archives; so a dedicated open-source tool is offered for download as of yesterday:

https://plakar.io/posts/2025-07-07/kapsul-a-tool-to-create-a...

winrid - 22 days ago

If you zoom in on your site before the cookies banner pops up you are stuck with just "Hi, we're cookies!" stuck on the screen and can't zoom out out

chungy - 22 days ago

Another similar archive format is WIM, the thing created by Microsoft for the Windows Vista (and newer) installer; an open source implementation is at: https://wimlib.net/

It offers similar deduplication, indexing, per-file compression, and versioning advantages

nemothekid - 22 days ago

>By contrast, S3 buckets are rarely backed up (a rather short-sighted approach for mission-critical cloud data), and even one-off archives are rarely done.

This is a complete aside, but how often are people backing up data to something other than S3? What I mean is it some piece of data is on S3, do people have a contingency for "S3 failing".

S3 is so durable in my mind now that I really only imagine having an "S3 backup" if (1) I had an existing system (e.g. tapes), or (2) I need multi-cloud redundancy. Other than that, once I assume something is in S3, I confident it's safe.

Obviously this was built over years (decades?) or reliability, and if your DRP requires alternatives, you should do them, but is anyone realistically paranoid about S3?

ac29 - 22 days ago

Are people really using gzip in 2025 for new projects?

Zstd has been widely available for a long time. Debian, which is pretty conservative with new software, has shipped zstd since at least stretch (released 2017).

gcr - 22 days ago

How does this differ from zpaq and dwarFS?

Zpaq is quite mature and also handles deduplication, versioning, etc.

Scaevolus - 22 days ago

Having the entire backup as a single file is interesting, but does it matter?

Restic has a similar featureset (deduplicated encrypted backups), but almost certainly has better incremental performance for complex use cases like storing X daily backups, Y weekly backups, etc. At the same time, it struggles with RAM usage when handling even 1TB of data, and presumably ptar has better scaling at that size.

ahofmann - 21 days ago

I'm trying to evaluate what plakar is. Is it like restic, Borgbackup, Kopia?

throwaway127482 - 22 days ago

Does this support content-defined chunking (CDC)?