Kafka is Fast – I'll use Postgres

topicpartition.io

520 points by enether a day ago


coldtea - 10 hours ago

>The claim is that it handles 80%+ of their use cases with 20% of the development effort. (Pareto Principle)

The Pareto principle is not some guarantee applicable to everything and anything saying that any X will handle 80% of some other thing's use cases with 20% the effort.

One can see how irrelevant its invocation is if we reverse: does Kafka also handle 80% of what Postgres does with 20% the effort? If not, what makes Postgres especially the "Pareto 80%" one in this comparison? Did Vilfredo Pareto had Postgres specifically in mind when forming the principle?

Pareto principle concerns situations where power-law distributions emerge. Not arbitrary server software comparisons.

Just say Postgres covers a lot of use cases people mindlessly go to shiny new software for that they don't really need, and is more battled tested, mature, and widely supported.

The Pareto principle is a red herring.

munchbunny - a day ago

My general opinion, off the cuff, from having worked at both small (hundreds of events per hour) and large (trillions of events per hour) scales for these sorts of problems:

1. Do you really need a queue? (Alternative: periodic polling of a DB)

2. What's your event volume and can it fit on one node for the foreseeable future, or even serverless compute (if not too expensive)? (Alternative: lightweight single-process web service, or several instances, on one node.)

3. If it can't fit on one node, do you really need a distributed queue? (Alternative: good ol' load balancing and REST API's, maybe with async semantics and retry semantics)

4. If you really do need a distributed queue, then you may as well use a distributed queue, such as Kafka. Even if you take on the complexity of managing a Kafka cluster, the programming and performance semantics are simpler to reason about than trying to shoehorn a distributed queue onto a SQL DB.

agentultra - a day ago

You have to be careful with the approach of using Postgres for everything. The way it locks tables and rows and the serialization levels it guarantees are not immediately obvious to a lot of folks and can become a serious bottle-neck for performance-sensitive workloads.

I've been a happy Postgres user for several decades. Postgres can do a lot! But like anything, don't rely on maxims to do your engineering for you.

dagss - 15 hours ago

I really believe this is the way: Event log tables in SQL. I have been doing it a lot.

A downside is the lack of tooling client side. For many using Kafka is worth it simply for the tooling in libraries consumer side.

If you just want to write an event handler function there is a lot of boilerplate to manage around it. (Persisting read cursors etc)

We introduced a company standard for one service pulling events from another service that fit well together with events stored in SQL.

https://github.com/vippsas/feedapi-spec

Nowhere close to Kafka's maturity in client side tooling but it is an approach for how a library stack could be built on top making this convenient and have the same library toolset support many storage engines. (On the server/storage side, Postgres is of course as mature as Kafka...)

ARandomerDude - 6 hours ago

I'm solidly in camp 2, the "common sense" camp that doesn't care about buzzwords.

That said, I don't consider running Kafka to be a headache. I work at a mid-sized company, processing billions of Kafka events per day and it's never been a problem, even locally when I'm processing hundreds of events per day.

You set it up, forget about it, and it scales endlessly. You don't have to rewrite anything and it provides a nice separation layer between your system components.

When starting out, you can easily run Kafka, DB, API on the same machine.

vbezhenar - a day ago

How do you implement "unique monotonically-increasing offset number"?

Naive approach with sequence (or serial type which uses sequence automatically) does not work. Transaction "one" gets number "123", transaction "two" gets number "124". Transaction "two" commits, now table contains "122", "124" rows and readers can start to process it. Then transaction "one" commits with its "123" number, but readers already past "124". And transaction "one" might never commit for various reasons (e.g. client just got power cut), so just waiting for "123" forever does not cut it.

Notifications can help with this approach, but then you can't restart old readers (and you don't need monotonic numbers at all).

uberduper - a day ago

Has this person actually benchmarked kafka? The results they get with their 96 vcpu setup could be achieved with kafka on the 4 vcpu setup. Their results with PG are absurdly slow.

If you don't need what kafka offers, don't use it. But don't pretend you're on to something with your custom 5k msg/s PG setup.

ownagefool - a day ago

The camps are wrong.

There's poles.

1. Is folks constantly adopting the new tech, whatever the motivation, and 2. I learned a thing and shall never learn anything else, ever.

Of course nobody exists actually on either pole, but the closer you are to either, the less pragmatic you are likely to be.

jimbokun - a day ago

For me the killer feature of Kafka was the ability to set the offset independently for each consumer.

In my company most of our topics need to be consumed by more than one application/team, so this feature is a must have. Also, the ability to move the offset backwards or forwards programmatically has been a life saver many times.

Does Postgres support this functionality for their queues?

misja111 - a day ago

> One camp chases buzzwords .. the other common sense

How is it common sense to try to re-implement Kafka in Posgres? You probably need something similar but more simple. Then implement that! But if you really need something like Kafka, then .. use Kafka!

IMO the author is now making the same mistake as some Kafka evangelists that try to implement a database in Kafka.

BinaryIgor - 4 hours ago

That's golden:

"2. The other camp chases common sense

This camp is far more pragmatic. They strip away unnecessary complexity and steer clear of overengineered solutions. They reason from first principles before making technology choices. They resist marketing hype and approach vendor claims with healthy skepticism."

We should definitely apply Occam's razor as the industry far more often; simple tech stacks are better to manage and especially master (which you must do, once it's no longer a toy app). Introduce a new component into your system only if it provides functionality you cannot get with reasonable effort, using what you already have.

natmaka - 4 hours ago

IMHO the main difference between PostgreSQL and any 'competitor' is that in most cases a software developer will quickly find not only how to use it quite properly for his use case but also why some way he adopted isn't right and triggers some non-negligible problem.

There are many reasons for this: most software developers have more than a vague idea about its underlying concepts, most error messages are clear, the documentation is superb, there are many ways to tap into the vast knowledge of a huge and growing community...

this_user - a day ago

The real two camps seem to be:

1) People constantly chasing the latest technology with no regard for whether it's appropriate for the situation.

2) People constantly trying to shoehorn their favourite technology into everything with no regard for whether it's appropriate for the situation.

bmcahren - a day ago

A huge benefit of single-database operations at scale is point-in-time recovery for the entire system thereby not having to coordinate recovery points between data stores. Alternatively, you can treat your queue as volatile depending on the purpose.

losvedir - a day ago

Maybe I missed it in the design here, but this pseudo-Kafka Postgres implementation doesn't really handle consumer groups very well. The great thing about Kafka consumer groups is it makes it easy to spread the load over several instances running your service. They'll all connect using the same group, and different partitions will be assigned to the different instances. As you scale up or down, the partition responsibilities will be updated accordingly.

You need some sort of server-side logic to manage that, and the consumer heartbeats, and generation tracking, to make sure that only the "correct" instances can actually commit the new offsets. Distributed systems are hard, and Kafka goes through a lot of trouble to ensure that you don't fail to process a message.

johnyzee - a day ago

Seems like you would at the very least need a fairly thick application layer on top of Postgres to make it look and act like a messaging system. At that point, seems like you have just built another messaging system.

Unless you're a five man shop where everybody just agrees to use that one table, make sure to manage transactions right, cron job retention, YOLO clustering, etc. etc.

Performance is probably last on the list of reasons to choose Kafka over Postgres.

Nifty3929 - 8 hours ago

I do agree that too often folks are looking for the cool new widget and looking to apply it to every problem, with fancy new "modernized" architectures and such. And Postgres is great for so much.

But I think an important point to those in camp 2 (the good guys in TFA's narrative) is to use tools for problems they were designed to solve. Postgres was not designed to be a pub-sub tool. Kafka was. Don't try to build your own pub-sub solution on top of Postgres, just use one of the products that was built for that job.

Another distressing trend I see is for every product to try to be everything to everyone. I do not need that. I just need your product to do it's one thing very well, and then I will use a different product for a different thing I need.

GrumpyGoblin - 5 hours ago

There is another aspect that many people aren't discussing, the communication aspect.

For a medium to large organization with independent programs that need to talk to each other, Kafka provides an essential capability that would be much slower and higher risk with Postgres.

Standardizing the flow of information across an organization is difficult. Kafka is crucial for that. To achieve that in Postgres would require either a shared database which is inherently risky or would require a customized API for access which introduces another layer of performance bottleneck and build/maintenance cost and decreases development productivity/performance. So you have a double whammy of performance degradation with an API. And for multiple consumers operating against the same events (for example: write to storage, perform action, send to data lake), with a database you need a magnitude more access, so N*X with N being the number of consumers multiplied by the query to consume. With three consumers you're tripling your database queries, which adds up fast across topics. Now you need to start fixing indexes and creating views and other workload to keep performance optimal. And at some point you're just poorly recreating Kafka in a database.

The common denominator in every "which is better" debate is always use case. This article seems like it would primariy apply to small organizations or limited consumer need. And yea, at that point why are you using events in the first place? Use a single API or database and be done with it. This is where the buzzword thing is relevant. If you're using Kafka for your single team, single database, small organization, it's overkill.

Side note: Someone mentioned Postgres as an audit log. Oh god. Done it. It was a nightmare. Ended up migrating to pub/sub with long-term storage in Mongo. which solved significant performance issues. Audit log is inheritently write once read many. There is no advantage to storing in a relational database.

brikym - a day ago

If you don't mind Redis then use Redis Streams. It gives you an eventlog without worrying about postgres performance issues and has consumer groups.

LinXitoW - 10 hours ago

Isn't one gigantic advantage with Postgres the ACID part?

It seems to me that the hardest part of going for a MQ/distributed log like Kafka is re-working existing code to now handle the lack of ACID stuff. Things that are trivial with Postgres, like exactly once delivery, are huge undertakings without ACID.

Personally, I don't have much experience with this, so maybe I'm just missing something?

spectraldrift - 17 hours ago

> Should You Use Postgres? Most of the time - yes

This made me wonder about a tangential statistic that would, in all likelihood, be impossible to derive:

If we looked at all database systems running at any given time, what proportion does each technology represent (e.g., Postgres vs. MySQL vs. [your favorite DB])? You could try to measure this in a few ways: bytes written/read, total rows, dollars of revenue served, etc.

It would be very challenging to land on a widely agreeable definition. We'd quickly get into the territory of what counts as a "database" and whether to include file systems, blockchains, or even paper. Still, it makes me wonder. I feel like such a question would be immensely interesting to answer.

Because then we might have a better definition of "most of the time."

ryandvm - a day ago

I think my only complaint about Kafka is the widespread misunderstanding that it is a suitable replacement for a work queue. I should not be having to explain to an enterprise architect the distinction between a distributed work queue and event streaming platform.

qsort - a day ago

I feel so seen lol. I work in data engineering and the first paragraph is me all the time. There are a lot of cool technologies (timeseries databases, vector databases, stuff like Synapse on Azure, "lakehouses" etc.) but they are mostly for edge cases.

I'm not saying they're useless, but if I see something like that lying around, it's more likely that someone put it there based on vibes rather than an actual engineering need. Postgres is good enough for OpenAI, chances are it's good enough for you.

sc68cal - a day ago

> Postgres doesn’t seem to have any popular libraries for pub-sub9 use cases, so I had to write my own.

Ok so instead of running Kafka, we're going to spend development cycles building our own?

dzonga - a day ago

what's not spoken about in the above article ?

ease of use. in ruby If I want to use kafka I can use karafka. or redis streams via the redis library. likewise if kafka is too complex to run there's countless alternatives which work as well - hell even 0mq with client libraries.

now with the postgres version I have to write my own stuff which I might not where it's gonna lead me.

postgres is scalable, no one doubts that. but what people forget to mention is the ecosystem around certain tools.

jjice - a day ago

This is a well written addition to the list of articles I need to reference on occasion to keep myself from using something new.

Postgres really is a startup's best friend most of the time. Building a new product that's going to deal with a good bit of reporting that I began to look at OLAP DBs for, but had hesitation to leave PG for it. This kind of seals it for me (and of course the reference to the class "Just Use Postgres for Everything" post helps) that I should Just Use Postgres (R).

On top of being easy to host and already being familiar with it, the resources out there for something like PG are near endless. Plus the team working on it is doing constant good work to make it even more impressive.

redbell - an hour ago

See also: Redis is fast – I'll cache in Postgres: https://news.ycombinator.com/item?id=45380699

honkostani - a day ago

Resume driven design, is running into the desert of moores plateau punishing the use of ever more useless abstractions. They get quieter, because their projects keep on dying after the revolutionary tech is introduced and they jump ship.

jeeybee - a day ago

If you like the “use Postgres until it breaks” approach, there’s a middle ground between hand-rolling and running Kafka/Redis/Rabbit: PGQueuer.

PGQueuer is a small Python library that turns Postgres into a durable job queue using the same primitives discussed here — `FOR UPDATE SKIP LOCKED` for safe concurrent dequeue and `LISTEN/NOTIFY` to wake workers without tight polling. It’s for background jobs (not a Kafka replacement), and it shines when your app already depends on Postgres.

Nice-to-haves without extra infra: per-entrypoint concurrency limits, retries/backoff, scheduling (cron-like), graceful shutdown, simple CLI install/migrations. If/when you truly outgrow it, you can move to Kafka with a clearer picture of your needs.

Repo: https://github.com/janbjorge/pgqueuer

Disclosure: I maintain PGQueuer.

loftsy - a day ago

I am about to start a project. I know I want an event sourced architecture. That is, the system is designed around a queue, all actors push/pull into the queue. This article gives me some pause.

Performance isn't a big deal for me. I had assumed that Kafka would give me things like decoupling, retry, dead-lettering, logging, schema validation, schema versioning, exactly once processing.

I like Postgres, and obviously I can write a queue ontop of it, but it seems like quite a lot of effort?

jdboyd - a day ago

While I appreciate the Postgres for everything point of view, and most of the times I use other things it could fit in Postgres, there are two areas that keep me using RabbitMQ, Redis, or a something like Elastic.

First, I frequently use Celery and Celery doesn't support using Postgres as a broker. It seems like it should, but I guess no one has stepped up to write that. So, when I use Celery, I end up also using Redis or RabbitMQ.

Second, if I need mqtt clients coming in from the internet at large, I don't feel comfortable exposing Postgres to that. Also, I'd rather use the mqtt ecosystem of libraries rather than having all of those devices talk Postgres directly.

Third, sometimes I want a size constrained memory only database or a database that automatically expires untouched records, and for either of those I usually use Redis. For these two tasks I use Redis. I imagine that it would be worth making a reusable set of stored procedures to accomplish the auto-expiring of unused records, but I haven't implemented it. I have no idea how to make Postgres be memory memory only with a constrained memory side.

- a day ago
[deleted]
udave - 16 hours ago

I find the distinction between queue and pub sub system quite poor. A pub sub system is just a persistent queue at its core, the only distinction is you have multiple queues for each subscriber, hence multiple readers. everything else stays the same. Ordering is expected to be strict in both cases. The Durability factor is also baked in both systems. On the question of bounded and unbounded queue: does not message queues also spill to disk in order to prevent OOM scenarios?

dangoodmanUT - a day ago

96 cores to get 240MB/s is terrible. Redpanda can do this with like one or two cores

- a day ago
[deleted]
woile - 16 hours ago

There are a few things missing I think.

I think kafka makes easy to create an event driven architecture. This is particularly useful when you have many teams. They are properly isolated from each other.

And with many teams, another problem comes, there's no guarantee that queries are gonna be properly written, then postgres' performance may be hindered.

Given this, I think using Kafka in companies with many teams can be useful, even if the data they move is not insanely big.

nchmy - a day ago

Seems like instead of a hand-rolled, polling Pub/sub, could instead do CDC instead with a golang logical replication/cdc library. There's surely various.

Or just use NATS for queues and pubsub - dead simple, can embed in your Go app and does much more than Kafka

asah - 14 hours ago

"500 KB/s workload should not use Kafka" - yyyy!!! indeed, I'm running 5MBps logging system through a single node RDS instance costing <$1000/mon (plus 2x for failover). There's easily 4-10x headroom for growth by paying AWS more money and 3-5x+ savings by optimizing the data structure.

shikhar - a day ago

Postgres is a way better fit than Kafka if you want a large number of durable streams. But a flexible OLTP database like PG is bound to require more resources and polling loops (not even long poll!) are not a great answer for following live updates.

Plug: If you need granular, durable streams in a serverless context, check out s2.dev

nyrikki - a day ago

> The claim isn’t that Postgres is functionally equivalent to any of these specialized systems. The claim is that it handles 80%+ of their use cases with 20% of the development effort. (Pareto Principle)

Lots of us that built systems when SQL was the only option, know that doesn’t hold overtime.

SStable backed systems have their applications, and I have never seen dedicated Kafka teams like we used to have with DBAs

We have the tools to make decisions based on real tradeoffs.

I highly recommend people dig into the appropriate tools to select vs making pre-selected products fit an unknown problem domain.

Tools are tactics, not strategies, tactics should be changeable with the strategic needs.

dev_l1x_be - 10 hours ago

Apples are sweet, I am going to eat an onion.

I love these articles.

> The other camp chases common sense

It is never too late to inject some tribalism into any discussion.

> Trend 1 - the “Small Data” movement.

404

Just perfect.

0xDEAFBEAD - 13 hours ago

Why does it matter how many distinct tools you use? It seems easiest to just always use the most standard tool in the most standard way, to minimize the amount of custom code you have to write.

phendrenad2 - a day ago

Since everyone is offering what they think the "camps" should be, here's another perspective. There are two camps: (A) Those who look at performance metrics ("96 cores to get 240MB/s is terrible") and assume that performance itself is enough to justify overruling any other concern (B) Those who look at all of the tradeoffs, including budget, maintenance, ease-of-use, etc.

You see this a lot in the tech world. "Why would you use Python, Python is slow" (objectively true, but does it matter for your high-value SaaS that gets 20 logins per day?)

jasonthorsness - a day ago

Using a single DBMS for many purposes because it is so flexible and “already there” from an operations perspective is something I’ve seen over and over again. It usually goes wrong eventually with one workload/use screwing up others but maybe that’s fine and a normal part of scaling?

I think a bigger issue is the DBMS themselves getting feature after feature and becoming bloated and unfocused. Add the thing to Postgres because it is convenient! At least Postgres has a decent plugin approach. But I think more use cases might be served by standalone products than by add-ons.

Copenjin - a day ago

I'm not really convinced by the comment on NOTIFY instead of the inferior (at least in theory) polling, I expect the global queue if it's really global to be only a temporary location to collect notifications before sending them and not a bottleneck. Never did any benchmark with PG or Oracle (that has a similar feature) but I expect that depending on the polling frequency and average amount of updates each solution could be the best depending on the circumstances.

suyash - 11 hours ago

Postgres isn't ideal, you need a timeseries database for streaming data.

tarun_anand - 20 hours ago

Couldn't agree more. Have built and ran an in-house postgresql based queue for several years. It can handle 5-10k msg/s in our production workloads.

Sparkyte - a day ago

You can also use Redis as a queue if the data isn't in danger of being too important.

8cvor6j844qw_d6 - a day ago

> Should You Use Postgres?

> Most of the time - yes. You should always default to Postgres until the constraints prove you wrong.

Interesting.

I've also been by my seniors that I should go with PostgreSQL by default unless I have a good justification not to.

heyitsdaad - a day ago

If the only tool you know is a hammer, everything starts looking like a nail.

bleonard - a day ago

I am excited about the Rails defaults where background and cache and sockets are all database driven. For normal-sized projects that still need those things, it's a huge win in simplicity.

rudderdev - a day ago

Discussion on the same topic "Postgres over Kafka" - https://news.ycombinator.com/item?id=44445841

guywithahat - a day ago

> One camp chases buzzwords

> ...

> The other camp chases common sense

I don't really like these simplifications. Like one group obviously isn't just dumb, they're doing things for reasons you maybe don't understand. I don't know enough about data science to make a call, but I'm guessing there were reasons to use Kafka due to current hardware limits or scalability concerns, and while the issues may not be as present today that doesn't mean they used Kafka just because they heard a new word and wanted to repeat it.

mbo - a day ago

This is an article in desperate need for some data visualizations. I do not think it does an effective job of communicating differences in performance.

wagwang - a day ago

Isn't listen/notify absurdly slow and lock contentious

ayongpm - a day ago

Just dropping this here casually:

  sup {
      position: relative;
      top: -0.4em;
      line-height: 0;
      vertical-align: baseline;
  }
lmm - 16 hours ago

If Kakfa had come first, no-one would ever pick Postgres. Yes, it offers a lot of fancy functionality. But most of that functionality is overengineered stuff you don't need, and/or causes more problems than it solves (e.g. transactions sound great until you have to deal with the deadlocks and realise they don't actually help you solve any business problems). Meanwhile with no true master-master HA in the base system you have to use a single point of failure server or a flaky (and probably expensive) third-party addon.

Just use Kafka. Even if you don't need speed or scalability, it's reliable, resilient, simple and well-factored, and gives you far fewer opportunities to architect your system wrong and paint yourself into a corner than Postgres does.

jackvanlightly - a day ago

> A 500 KB/s workload should not use Kafka

This is a simplistic take. Kafka isn't just about scale, it, like other messaging systems provide queue/streaming semantics for applications. Sure you can roll your own queue on a database for small use cases, but it adds complexity to the lives of developers. You can offload the burden of running Kafka by choosing a Kafka-as-a-service vendor, but you can't offload the additional work of the developer that comes from using a database as a queue.

CuriouslyC - a day ago

If you don't need all the bells and whistles of Kafka, NATS Jetstream is usually the way to go.

odie5533 - a day ago

How fast is failover?

sherinjosephroy - 11 hours ago

Good reminder: if your message load is modest, sticking with something you know (like Postgres) might be wiser than going full-Kafka. Complexity adds cost, and you only need big guns when you're really under fire.

psadri - a day ago

A resource that would benefit the entire community is a set of ballpark figures for what kind of performance is "normal" given a particular hardware + data volume. I know this is a hard problem because there is so much variation across workloads, but I think even order of magnitude ballparks would be useful. For example, it could say things like:

task: msg queue

software: kafka

hardware: m7i.xlarge (vCPUs: 4 Memory: 16 GiB)

payload: 2kb / msg

possible performance: ### - #### msgs / second

etc…

So many times I've found myself wondering: is this thing behaving within an order of magnitude of a correctly setup version so that I can decide whether I should leave it alone or spend more time on it.

smoyer - 20 hours ago

Kafka is fast ... And MongoDB is web scale [0]. I completely agree that we shouldn't go chasing each new technical bauble but we are also wasting breath on those that do.

0. https://youtu.be/b2F-DItXtZs?si=vrB-UxCHIgMYGKFt

rjurney - a day ago

One bad message in a Kafka queue and guess what? The entire queue is down because it kills your workers over and over. To fix it? You have to resize the queue to zero, which means losing requests. This KILLS me. Jay Kreps says there is no reason it can't be fixed, but it never had been and this infuriates me because it happens so often :)

cpursley - a day ago

Related: https://www.pgflow.dev

It's built on pgmq and not married to supabase (nearly everything is in the database).

Postgres is enough.

me551ah - a day ago

Imagine if historic humans had decided that only hammers are enough. That there is no need for a specialized tool like Scissors, Chisel, Axe, Wrench, Shovel , Sickle and that a hammer and fingers are enough.

Use the tool which is appropriate for the job, it is trivial to write code to use them with LLMs these days and these software are mature enough to rarely cause problems and tools built for a purpose will always be more performant.

aussieguy1234 - 16 hours ago

I've found Kafka to be not particularly great with languages other than Java, if Confluent schemaregisty is involved.

I had fun working with the schema registy from TypeScript.

- a day ago
[deleted]
lisbbb - a day ago

If you are doing high volume, there is no way that a SQL db is going to keep up. I did a lot of work with Kafka but what we constantly ran into was managing expectations--costs were higher, so the business needs to strongly justify why they need their big data toy, and joins are much harder, as well as data validation in real time. It made for a frustrating experience most of the time--not due to the tech as much as dealing with people who don't understand the costs and benefits.

On the major projects I worked on, we were "instructed" to use Kafka for, I guess, internal political reasons. They already had Hadoop solutions that more or less worked, but the code was written by idiots in "Spark/Scala" (their favorite buzzword to act all high and mighty) and that code had zero tests (it was truly a "test in prod" situation there). The Hadoop system was managed by people who would parcel out compute resources politically, as in, their friends got all they wanted while everyone else got basically none. This was a major S&P company, Fortune 10, and the internal politics were abusive to say the least.

oulipo2 - a day ago

I want to rewrite some of my setup, we're doing IoT, and I was planning on

MQTT -> Redpanda (for message logs and replay, etc) -> Postgres/Timescaledb (for data) + S3 (for archive)

(and possibly Flink/RisingWave/Arroyo somewhere in order to do some alerting/incrementally updated materialized views/ etc)

this seems "simple enough" (but I don't have any experience with Redpanda) but is indeed one more moving part compared to MQTT -> Postgres (as a queue) -> Postgres/Timescaledb + S3

Questions:

1. my "fear" would be that if I use the same Postgres for the queue and for my business database, the "message ingestion" part could block the "business" part sometimes (locks, etc)? Also perhaps when I want to update the schema of my database and not "stop" the inflow of messages, not sure if this would be easy?

2. also that since it would write messages in the queue and then delete them, there would be a lot of GC/Vacuuming to do, compared to my business database which is mostly append-only?

3. and if I split the "Postgres queue" from "Postgres database" as two different processes, of course I have "one less tech to learn", but I still have to get used to pgmq, integrate it, etc, is that really much easier than adding Redpanda?

4. I guess most Postgres queues are also "simple" and don't provide "fanout" for multiple things (eg I want to take one of my IoT message, clean it up, store it in my timescaledb, and also archive it to S3, and also run an alert detector on it, etc)

What would be the recommendation?

justinhj - a day ago

As engineers we should try to use the right tool for the job, which means thinking about the development team's strengths and weaknesses as well as differentiating factors your product should focus on. Often we are working in the cloud and it's much easier to use a queue or a log database service than manage a bunch of sql servers and custom logic. It can be more cost effective too once you factor in the development time and operational costs.

The fact that there is no common library that implements the authors strategy is a good sign that there is not much demand for this.

- a day ago
[deleted]
zer00eyz - a day ago

> Should You Use Postgres? Most of the time - yes. You should always default to Postgres until the constraints prove you wrong.

Kafka, GraphQL... These are the two technology's where my first question is always this: Does the person who championed/lead this project still work here?

The answer is almost always "no, they got a new job after we launched".

Resume Architecture is a real thing. Meanwhile the people left behind have to deal with a monster...

sneilan1 - a day ago

I'm starting to like mongodb a lot more given the python library mongomock. I find it wonderful to create tests that run my queries against mongo in code before I deploy them. Yes, mongo has a lot of quirks and you have to know aws networking to set it up with your vpc so you don't get nailed with egress costs. And it's not the same query patterns and some queries are harder and you have maintain your own schemas. But the ability to test mongo code with mongomock w/o having to run your own mongo server is SO VALUABLE. And yes, there are edge cases with mongomock not supporting something but the library is open source and pretty easy to modify. And it fails loudly which is super helpful. So if something is not supported you'll know. Maybe you might find a real nasty feature that's hard to implement but then just use a repository pattern like you would for testing postgres code in your application.

https://github.com/mongomock/mongomock Extrapolating from my personal usage of this library to others, I'm starting to think that mongodb's 25 billion dollar valuation is partially based on this open source package :)