A few weeks ago I wrote a small app that fetches JSON documents from app.net’s API and draws a word cloud. At first I wasn’t keeping the content around after generating the images. Later I thought of other things I’d like to do with the documents, so I decided to start storing them.
I’d never used MongoDB, and I have little interest in the NoSQL hype (particularly for my own toy projects). However, it seemed like a good fit for what I wanted to do: store and query JSON documents without worrying about schemas. I followed the MongoDB Ruby tutorial, which shows you how simple it is to insert documents into Mongo:
doc = {"name" => "MongoDB", "type" => "database", "count" => 1,
"info" => {"x" => 203, "y" => '102'}}
coll.insert(doc)
So, one gem install and two lines of code later I was happily inserting documents into a MongoDB server on my puny AWS Micro instance somewhere in Oregon. It worked just fine for all of three weeks.
Yesterday I decided to compute some stats, and I discovered that the most recent document was four days old. Hmmm. I checked my script that fetches and inserts documents; it was running and there were no errors in the log. MongoDB seemed fine too. What could be the problem? Long story short, this blog post from three years ago explains it:
32-bit MongoDB processes are limited to about 2 gb of data. This has come as a surprise to a lot of people who are used to not having to worry about that. The reason for this is that the MongoDB storage engine uses memory-mapped files for performance.
By not supporting more than 2gb on 32-bit, we’ve been able to keep our code much simpler and cleaner. This greatly reduces the number of bugs, and reduces the time that we need to release a 1.0 product. The world is moving toward all 64-bit very quickly. Right now there aren’t too many people for whom 64-bit is a problem, and in the long term, we think this will be a non-issue.
Sure enough, my database had reached 2GB in size and the inserts started failing silently. WTF zomg LOL zombie sandwiches!
This is a horrendous design flaw for a piece of software that calls itself a database. From the Zen of Python:
Errors should never pass silently.
Unless explicitly silenced.
There is a post on Hacker News by someone who doesn’t like the Go Language because you have to check errors in return values (that’s where I got the quote above). This is worse because like I just said, MongoDB is a database (or at least it plays one on the web). If you tell a database to store something, and it doesn’t complain, you should safely assume that it was stored. In fact, the Ruby tutorial never tells you to check any error codes. That’s what exceptions are for.
This gave me a nasty feeling about MongoDB. If something so elementary can be so wrong, what other problems could be lurking in there? I immediately switched to CouchDB (once again because it was pretty trivial), but if this were a serious project I’d be using Postgres. I’d spend the extra hour figuring out the right schema, or maybe I’d even try the new JSON support in Postgres 9.2.
Wait a second, maybe I should reconsider. After all, relational databases were not designed for the web. And MongoDB is Web Scale.
Slap me silly on Hacker News or maybe Reddit Programming 🙂