Settings

Theme

PGHoard: Tools for making PostgreSQL backups to cloud object storages

blog.aiven.io

72 points by melor 10 years ago · 13 comments

Reader

waffle_ss 10 years ago

How does this compare to WAL-E?

https://github.com/wal-e/wal-e

  • melorOP 10 years ago

    Both do mostly the same thing with some differences. The biggest difference currently could be that WAL-E uses the PostgreSQL "archive_command" to send incremental backups (WAL files) in complete 16 megabyte chunks, whereas PGHoard uses real-time streaming with "pg_receivexlog", making the data loss window much smaller in case of a disaster.

    • willlll 10 years ago

      You can set archive_timeout to something like 1 minute to bound the window.

  • oskari 10 years ago

    PGHoard also supports Google Cloud which doesn't seem to be supported in WAL-E at the moment.

melorOP 10 years ago

Takes care of realtime WAL streaming, compression, encryption, restoration and backup expiration among other things. Open Source and written in Python.

  • brudgers 10 years ago

    Curious if it backs up to other cloud storage providers in vendor neutral ways.

    • melorOP 10 years ago

      Currently S3 (AWS + compatible), Google Cloud, OpenStack Swift, Azure (experimental), local disk and Ceph (via S3 or Swift) are supported. More can be added quite easily as the object storage logic is behind an extendable interface.

      Which vendor neutral protocol are you interested in using?

      • merb 10 years ago

        What will happen when the Storage (swift or ceph) is offline for some time?

        • oskari 10 years ago

          PGHoard can archive PG's WAL segments in two modes: streaming directly using pg_receivexlog or as an archive_command to archive complete segments.

          When PGHoard is used in streaming mode it keeps reading new segments from PG and stores them in compressed & encrypted form in a queue ready to be uploaded. The segments will stay there until they can be uploaded.

          When using archive_mode PGHoard handles the operation synchronously so PG won't actually remove or recycle the WAL segment in question until the command completes.

          Postgres will keep running normally in both cases, but the files will be queued in different places, compressed or uncompressed. This may cause your disk to fill up eventually, but PGHoard will trigger an alart after a configurable number of upload failures.

    • oskari 10 years ago

      PGHoard has quite high unit test coverage (85%) and it's pretty easy to add a new object storage configuration to tests to verify that all the APIs used by PGHoard work properly.

anarazel 10 years ago

Do you prevent segments from being removed while they're not yet received by pg_receivexlog? wal_keep_segments or replication slots?

  • melorOP 10 years ago

    A replication slot can be used by defining it in the pghoard.json configuration. However, the slot needs to be created (and removed after no longer needed, important!) manually. We've been planning to add more automatic replication slot management to PGHoard.

    • anarazel 10 years ago

      Good. Without archiving or slots in place, you really can't rely on such backups...

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection