Can PostgreSQL pickup where MySQL left off?
news.cnet.comWhen I started with PHP3 development in 1998, PostgreSQL was still called Postgres95 and it had some severe limitations which were directly adverse to the features needed for web development:
1) performance was very slow
2) There was an effective row size limit of 8K over the whole row which worked against people who wrote forum software (big posts) or were (ugh) storing binaries in the database.
3) Even today, PostgreSQL requires a bit more maintenance and configuration until it works right, whereas MySQL ran out of the box. Back then, VACUUM wasn't at all optional and still required an exclusive lock over the tables it was working on.
4) There was a misconception that each Postgres user needed to be a Unix system user (this might or might not be true. Probably isn't, but I and others were certainly thinking that), so it was working against the usual cheap virtual hosting.
5) The MySQL extension for PHP was much more advanced than the postgres one, leading to people preferring MySQL which in the end lead to the virtual hosts only installing MySQL at which point Postgres' fate was sealed.
6) MySQL was coined as a spiritual successor to mSQL (just look at the names), which was very popular at the time as it was the first SQL-database for "normal people". You either did flat-files or mSQL.
7) MySQL had Windows support from the beginning. Postgres95 was still using cygwin (if you could get it to work). Back then, many people were using Windows as their development environment and being able to run the whole stack on your development machine does have some advantages.
By now, aside of the maintenance thing, Postgres has lost all these drawbacks, but now it's too late as people are using what they know is working for them, so they are just going with MySQL (or SQLite if they need more simplicity, which is interesting in itself as MySQL for ages provided an embedded linkable and serverless variant which also never took on).
#2 - even if you don't store binaries, just very long posts, it's a problem.
#4 - whether it's a misconception or not, the majority of tutorials and even the default utilities make that assumption. Working around that is not something as widely documented as the defaults.
Also, forum software is a big arena. Not being able to get accurate count() values back hurt(s) postgresql. It's hurting people who now rely on MySQL's innodb as well, but you always have the option of MyISAM for certain tables that you want accurate count()s on for pagination. I've heard for years that "it's so easy to do this in PostgreSQL, just write some triggers and stored procedures... " yet... if it's "so easy", why not just bundle that in as default functionality (or perhaps even just a script that would create triggers for you) in postgresql?
I don't ever recall hearing about the "embedded linkable and serverless variant" of MySQL, and have been doing PHP since 1996. Perhaps it was a licensing issue? MySQL seemed to take off largely due to a rise in PHP, but sqlite seems to have taken off more because of embeddability and the public domain aspect of it first, well, before I ever saw PHP adoption of sqlite.
> I don't ever recall hearing about the "embedded linkable and serverless variant" of MySQL, and have been doing PHP since 1996. Perhaps it was a licensing issue?
http://dev.mysql.com/doc/refman/5.1/en/libmysqld.html
pretty much a licensing issue as all of MySQL is released under the GPL or a proprietary license. This also includes the mysql client by the way, but they made a license exception there that allowed linking against PHP (extensions mysql and mysqli) and later there was a reimplementation of the on-the-wire protocol inside a PHP extension (mysqlnd) released under the PHP license.
> Not being able to get accurate count() values back hurt(s) postgresql.
count() is totally accurate within the limits of MVCC. If you need it to be 100% accurate across statements, make your transaction SERIALIZABLE.
There's one thing about count(): count(*) under MySQL, if using MyISAM tables, is optimized so it doesn't have to actually count and thus is much faster than, say count(row), count(whatever) with a where clause or count([asterisk as to not confuse the HN parser.]) in any other database.
But the moment you use InnoDB or any other database that supports transactions, count(whatever) unfortunately requires counting in all cases.
re: count() - I guess I should have also said "reasonably fast". Waiting 8 seconds for a count() isn't practical for many apps.
> why not just bundle that in as default functionality (or perhaps even just a script that would create triggers for you) in postgresql?
Multi-version concurrency control means that a single universal definition of count() is basically insensible.
Under what circumstances is count() inaccurate in Postgres?
See earlier comment - I misstated my point - it's not so much that it's inaccurate as slow. Dog slow. InnoDB in MySQL 'fixes' the speed with estimates, but that's generally not very useful either in most cases (sometimes, it is).
I was confusing my earlier innodb count() experiences (inaccurate estimates) with postgresql count() experiences (dog slow).
> 2) There was an effective row size limit of 8K over the whole row which worked against people who wrote forum software and were (ugh) storing binaries in the database.
Could you not just use the dedicated TEXT or BLOB type for that kind of stuff, and store them out-of-row? (yes, would probably have been even slower)
Nope. There was blob-support, but that relied on special library functions to extract the data to a file which you'd then have to manually read.
Of course this also means that you couldn't search in these fields or do anything else you'd do directly on the database.
TOAST tables were added in 7.1 AFAIK which lifted that limit and allowed for arbitrary sized rows by moving the big fields away into their own storage.
The old functionality is still there (http://www.postgresql.org/docs/9.0/interactive/largeobjects....) though I totally failed and still fail to see any practical use.
> Nope. There was blob-support, but that relied on special library functions to extract the data to a file which you'd then have to manually read.
OK.
> The old functionality is still there (http://www.postgresql.org/docs/9.0/interactive/largeobjects....) though I totally failed and still fail to see any practical use.
Memory constraints maybe? That's about the only justification I could see for using that over bytea: if you store single binary objects in the GB+ range, you may want to interact with them as streams.
I wonder why MySQL became super-popular and Postgres didn't. I remember reading for years about how Postgres had a bunch of stuff (e.g. transactions) that MySQL didn't (until recently) and that anybody who knew anything about databases would choose Postgres, yet people still chose MySQL.
What did MySQL have during the last decade that made it better for certain applications than Postgres?
MySQL was simple.
You didn't have to deal with schema/tablespaces, or the chance that someone used stored procs, or security (almost everyone just used the equivalent of a root account).
MySQL just worked.
For the most part, with zero monitoring, it stayed working until the server ran out of disk space or died.
MySQL was always available.
No matter which hosting company you looked at, no matter how little you wanted to spend, MySQL was pre-installed and ready for use.
I never thought it was the better choice, but I understood why it dominated. Once things like Wordpress, vBulletin, phpBB and other stuff was written only for MySQL the choice was then made for you.
I've long been a fan of applications that give you a choice of DB, but it's now unrealistic as many applications have designed schemas to cope with the quirks and limitations of MySQL specifically. These would now be harder to migrate than simply changing a database string, and most application devs don't invest in such things when what they have works well enough and is all anyone is asking for.
The second half of your explanation, which boils down to "it's everywhere" makes sense now, but is only true because it was already in demand.
That it was faster (for certain applications), easier and more reliable does make sense. I understand, for example why PHP got popular even though there were other options available at the time that I believe were better from a software engineering[0] perspective. Things get popular by being better at the things the majority of users care about, even if they're awful at everything else.
[0] Did I really just use that term? I can't think of a better one to describe the issue I'm talking about.
For many people, including hosting admins, MySQL was the path of least resistance. MySQL required less maintenance than Postgres back in the day. It's pretty much exactly the same situation as with PHP. The tool created by mere mortals is usable by mere mortals, so they use it. The tool created by scholars is technically superior at core functions but is slightly more difficult or cumbersome to use so is used by fewer mortals.
One thing people seem to forget is that Postgres has not been around longer than MySQL, both were released to the public in 1995. Postgres skipped a few version numbers so maybe that makes it seem older.
Counting public releases of non university projects sure, but at least according to the wiki page it has been under active development since 1986, with first release in 1989. Where Mysql started development in 1994.
Actually, phpBB 2.0.0 which was released in 2002 supported: IBM DB2, MS Access, MS SQL, MySQL3, MySQL4, Oracle... and Postgres 7.
But you are right, many of the other popular projects did not make this effort.
MySQL is a triumph of branding and fanboy self-delusion.
MySQL made it easier initially by omitting lots of difficult stuff (data integrity, management, transactions) thus making easier to get something working quickly. It also had a reputation for being faster then PostgreSQL, a reputation spread by fanboys (in truth people who were bigging up the only database they had a faint comprehension of).
Then it became ubiquitious. Now, like some nasty rash, we can't get rid of it.
> It also had a reputation for being faster then PostgreSQL
It was indeed much faster when reading (probably still is on MyISAM) due to the lack of everything (especially transactions and any concept of data integrity).
> MySQL made it easier initially by omitting lots of difficult stuff (data integrity, management, transactions) thus making easier to get something working quickly.
Remind anyone of NoSQL? :)
MySQL (MyISAM, really) was faster for loading chunks of text and spitting them into Perl and PHP. Back in the 133Mhz days, people cared about that. Given that MyISAM was pretty much flat files in a clown suit, you didn't get much in the way of serious database functionality.
I can say from a web dev's perspective that the ubiquitousness of MySQL is only part of the allure (an important part though). MySQL makes simple stuff very easy to implement, it's a great solution if you just want your database to store your stuff and get out of the way. Sure, like any other DB it requires some expertise if you want performance, but it makes this incredibly simple. It just works and you don't spend a lot of time arguing with the software.
It also has a single feature that I find incredibly cool: it supports automatically generated row IDs. It's been a while since I last tried Oracle and Postgres, but at the time you had to create a second table just for a sequence number and it was extremely awkward to handle.
Could the auto-increment feature in MySQL be better? Sure, for example it would be nice to have an option where the row is identified by a hash value instead of a continuous sequence. But even in its limited form today it beats having to wrestle with the glued-together equivalent as implemented in "real databases".
> It also has a single feature that I find incredibly cool: it supports automatically generated row IDs. It's been a while since I last tried Oracle and Postgres, but at the time you had to create a second table just for a sequence number and it was extremely awkward to handle.
Must have been a while indeed. Postgres's SERIAL datatype has been available at least since 6.4. Postgres 6.4 was released in 1998.
Postgres has had auto-incrementing row ids for a while, see this example: https://gist.github.com/1178237
I may be wrong, but isn't that just a syntactical shortcut for creating a separate sequence table?
Yes (that's noted in the gist). But the end result's the same, instead of writing AUTOINCREMENT you write SERIAL. That's about it, the only issue may happen if you drop the table (the sequence will not be dropped, not a huge issue in practice)
Did you notice any differences in performance?
Edit: I'm not saying there are, but it feels like there might be some repercussions when you take the underlying implementation into account. It would be interesting to know if someone conducted comparative tests about this.
MySQL was popular for the same reason No-SQL is - it lacked features, but it was fast as hell with no tuning. Postgres did more, so it had to be slower, and it was tuned to work on minimal resources (and not steal all your RAM). Like MongoDB today, MySQL was tuned to work fast, at the costs of both reliability, and being a poor citizen on a shared host.
There are other factors - you could limit its use of disk space (so hosts liked it), but mostly it was a good key-value store with some database functionality.
People who wanted a "real" database used Postgres. People who wanted to write a web app used MySQL.
I seem to faintly remember that in the beginning Postgres was Unix only, and MySQL was available for Windows, too.
Well, MySQL has been easier to set up, that cannot be underestimated; I've basically saved off a pair of configuration files I've just reused a dozen times to set up PostgreSQL. MySQL also works pretty fast right out of the box, if you data is smaller than physical memory, it might be nearly as fast as you can make it by default. PostgreSQL has fairly conservative performance settings when first installed, you can make it really sing but it takes some tuning and in some cases that tuning requires a fairly deep understanding of your data. Think most MySQL users are not DBAs, all the data specific settings for a PostgreSQL like database might frighten them some; it's a really odd juxtaposition when you think about it, stereotypically, geeks are drawn to the knobs and dials you can play with but I honestly think in this case it turned them away.
Then a little bit of support from ISPs and corporations and it's a no-brainer for most folks. All the oss CMS and blogs and everything support MySQL really really well. ALl the ISPs pretty much use MySQL as the "database" solution.
The query browser and admin tools were easier to use. They weren't really better, rather the presentation was simpler. The Postgres tools (pgAdmin) shows a little too much detail right away (namespaces, etc.) and looks a lot more complicated to use which might put off people who just need to get it working and build from there.
When I first got into full-time pro web development (about a decade ago), I started learning PHP. Just about every tutorial I read or book I wished I could afford had 'MySQL' next to 'PHP' as though they were one in the same.
From there forward, any time I was looking into a new host, they ALL had both PHP and MySQL. Any job I was bidding, discussion I was having in IRC or on Forums, anywhere - the two came as a matching set. I can't say that's WHY it became more popular, but I've always assumed their popularity were closely aligned.
Right now - the biggest problem is Wordpress which has ZERO postgres support. A PG-supported version of Wordpress will go a long, long way in popularizing Postgres.
It'll never happen. Wordpress core might be ported, but there's thousands of plugins and themes that assume MySQL.
MySQL got the wrong results faster.
I've always likened it to riding a bicycle downhill with no brakes. Yes, it's faster, but...
Of course, now they've converged a lot, with Postgresql getting significant speed upgrades, and Mysql having InnoDB, with a few 'minor details' like transactions.
Well, PostgreSQL is licensed under BSD, which makes closed-source, commercial forks possible. That's why it is attractive for large businesses, like Apple.
GPL-licensed MySQL serves the opposite purpose: it is impossible to fork it into commercial DB product, grow it as times goes by, and threaten Oracle. It is nearly impossible to create commercial product for this market, because there's a free product there: MySQL. MySQL is like a ditch around the Oracle stronghold, a ditch to keep competitors far away. GPL is a perfect tool for such tricks.
In contrast to GPL which suppresses commercial developments in it's market segment, BSD encourages them.
For the same reason, Apple slowly gets rid of GCC in favour of clang. It is perfectly OK from legal point of view to include BSD-licensed compiler into commercial closed-source IDE, unlike GPL-licensed compiler.
I actually think the license had very little to do with its popularity. The fact that MySQL was a lot simpler than Pg made it popular.
IMO the short answer is: no.
Most opensource software that uses MySQL is riddled with MySQLisms.
Many forks exists to deal with this situation http://en.wikipedia.org/wiki/MySQL#Forks
You don't need fork for that — just enable ANSI mode.
The problem is that MySQL-based software doesn't do that and relies on the default "quirks" mode.
In my mind, replication has historically been a pretty big deciding factor in favor of MySQL. PostgreSQL didn't support it natively until version 9.0 (which still isn't part of most Linux distro repositories). You'd have to install Slony or something similar. MySQL made master/slave or master/master replication simple to configure and a breeze to maintain.
thanks for all these points. I am doing a presentation tomorrow on the same and all these points will help!
When you’re choosing a database, you’re making a long-term decision, because changing your mind later is difficult and expensive. You want to get it right the first time
Here's a comparison in a tabluar format http://database-management-systems.findthebest.com/compare/1...
that's a good comparison. We use MySQL and PostGre both in our organization and the deciding factor is mainly cost.
> PostGre
It has never been named "PostGre".
Its original name was Postgres, the current one is PostgreSQL with Postgres the accepted nickname.