Gazette – Build platforms that flexibly mix SQL, batch, and stream processing
github.comHi, I’m the primary author of gazette. Interesting seeing it here, happy to answer questions.
We’re continuing to improve it - it’s a core implementation detail of our current project & company Estuary Flow (https://estuary.dev), which aims to further simplify and democratize low latency data products.
The slides [1] helped me grok what problems this tool solves pretty quickly.
Everything big data is moving to blob storage these days, but streaming can lead to small files problem or longer latencies. File fragments stored locally with proxied readers seems like a simple solution to that.
[1] https://gazette.readthedocs.io/en/latest/overview-slides.htm...
Small files should be in arrow/avro/parquet if your architecture allows (one should strive for this from the beginning).
The GitHub README and docs mention that Gazette has been running in production for 5 years, but I don’t see any mention of _where_. I assume this began as an internal project at some company - does anyone know which?
it came out of arbor.io (now part of liveramp.com)
> liveramp.com
I am going to give this project a big no if it has anything to do with with liveramp.
Did you have a bad previous experience from their engineering?
in the postgresql world logical replication is boon. out-of-band immutable datastore is key, as well.
we are looking forward to qualified replication in upcoming pg15.