Settings

Theme

Code rant: The Database As Queue Anti-Pattern

mikehadlow.blogspot.se

13 points by FrancescoRizzi 14 years ago · 14 comments

Reader

btilly 14 years ago

The database is not the world's most efficient way to do this. However if it isn't a bottleneck, who cares? It is one less dependency to maintain, understand, etc.

We're looking to create working solutions, not works of art. Seeking the perfect component for every piece is often simply not worthwhile.

redcircle 14 years ago

I find it ironic that he writes this as a rant, which is an anti-pattern for influencing people. One of his commenters hints at how it might have been better presented: introduce messaging systems, explain how they solve the problem, and describe their advantages over the database. For further discussion on why a rant is probably an anti-pattern, see Dale Carnegie's "How to Win Friends and Influence People."

tarr11 14 years ago

Using databases as a queue is not an anti-pattern (delayed_job is an example).

A full blown message queue system is often overkill; it adds another system that you have to learn. (And frankly, queues are often backed by databases anyway)

That said, it's best to avoid having application state in the queue, it should be generic. This makes it more scalable and flexible.

  • Retric 14 years ago

    Also, if your using the DB as a queue index on the status element. Inserts are still low overhead and the DB can handle poling that table 10,000 time a second without issue.

    Edit: Just don't pole the 'finished' status as that can have a lot of elements in it.

    • koenigdavidmj 14 years ago

      A lot of databases can also provide partial indices, which are only maintained and searched when a WHERE clause matches (example: in this case, WHERE status != 'filtered' would be a good choice).

gaius 14 years ago

But no-one polls a database anymore; they all have a callback mechanism an app can just subscribe to.

dclaysmith 14 years ago

I use MySql+cron as a simple message queue every now and again--it's great for jobs like sending confirmation emails or registering new users for a mailing list. ie. Tasks that make synchronous calls over the network and can degrade the speed of important processes (registration, etc). You just have to make sure that these tasks are going to be infrequent and not prone to spikes that might overwhelm the server.

jgrahamc 14 years ago

A lot of that post is specific to SQL Server. I used a MySQL table as a queue as part of Signal Spam (https://www.signal-spam.fr/) and it worked flawlessly. Polling was at 1 minute intervals and queue entries that were completed were simply deleted.

  • jpitz 14 years ago

    That particular use-case is such an anti-pattern in SQL Server ( because SQL Server isn't or hasn't until recently been MVCC ) that Microsoft added Service Broker to the product.

    It isn't nearly the anti-pattern, at least for many loads, in an MVCC database engine.

    • endersshadow 14 years ago

      SQL Server has been MVCC for seven years[1]...

      And anyway, I disagree with that it's an anti-pattern. If you want to capture information about each step of a process, SQL Server (or any RDBMS) is great at that. However, if what you want is a workflow that your RDBMS is facilitating, and you really don't care about the intermediate steps, you just want to make sure it gets done, then there are a lot of workflow systems out there that do a great job. I'm also a big fan of combining workflow software with logging the steps to SQL Server for a straight-forward process.

      But again, it depends on what data's being stored and what types of queuing you're doing. One project I've done that comes to mind is to take multiple sources, grab the changes, then store the combined, conformed data in an operational data store, while flagging the rows that were changed. Downstream systems then pick up the changed rows based on their last run and pull those changes down--completely oblivious to the fact that multiple sources are coming together. In this case, you could abstract it away to the fact that it's a queuing system, but there are multiple inputs and multiple endpoints, so a persistent store with that data is paramount.

      [1]: http://en.wikipedia.org/wiki/Multiversion_concurrency_contro...

      • jpitz 14 years ago

        >SQL Server has been MVCC for seven years

        You're technically correct, which is the best kind of correct to be. Still, there are nuances: snapshot isolation is not the default, and not many people are aware of it, much less use it.

        So far as the pattern/antipattern - I agree wholeheartedly with you. I would put it in the category of flagged patterns - the usage characteristics and especially the usage growth need to be carefully understood when doing this.

pbreit 14 years ago

I would be happy to use something else but every time I start to look at ZeroMQ or celery, I lose interest and just go back to the DB which surprisingly, seems a lot easier to use. Is there a delayed_job for Python?

tszming 14 years ago

Let's face it, not all systems in the world need to be "web scaled", as the author also said: "Simple, use the right tool for the job".

But you need to understand not all jobs are created equal..

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection