Settings

Theme

Ask HN: How PostgreSQL source code is 3x shorter than MySQL's one?

119 points by mcsoft 5 years ago · 36 comments · 1 min read


It looks like PostgreSQL is more feature-rich and reliable, though it's written in only 1491985 LOC vs 4466967 LOC for MySQL. How is that?

twotwotwo 5 years ago

So I cloned the mysql-server and postgres repos and ran sloccount. It's not the deepest dive or anything but was interesting.

I saw MySQL had...600k lines of JavaScript?

It turned out that the storage/ndb directory had a Web-based management interface for NDB, which vendors in the Dojo JavaScript framework. It also had ~50k lines of Java for the "ClusterJ" framework, which interfaces with NDB skipping the SQL layer. Overall, sloccount reports about 1.4M lines of code in storage/ndb/.

NDB is a specialized cluster database where all secondary indexes have to fit in the cluster's RAM(!). Work on it started at Ericsson, then it was spun out into a startup which MySQL AB eventually bought. I imagine MySQL management at the time hoped the future of DB clusters might end up looking more like NDB and less like...gestures at today's database landscape.

There are also ~500K lines of Unicode tables in the strings/ directory. I recall Postgres calls out to libc for locale/collation related stuff so probably doesn't need those tables in-tree.

Even accounting for those chunks you still end up with ~1M vs ~2M SLOC as measured by sloccount. (I don't want to pretend the numbers are super precise.) There are probably other differences in what's in scope for the repo or other surprises. [Edit: see johannes1234321's comment which lists some of them.]

Besides those, though, might be truth to others' comments about MySQL spending lots of code supporting drastically different old and new "worlds" in a single binary (non-transactional and transactional storage, originally very-nonstandard vs. currently more-standard SQL, statement-based and row-based replication...). And at a totally non-technical level, as a product MySQL seems to have had more money thrown at it and that tends to mean more code.

This was fun but was an incredibly quick and dirty dive into it, and I'd love to hear more from folks who can look more or just know more.

  • twoodfin 5 years ago

    NDB is a specialized cluster database where all secondary indexes have to fit in the cluster's RAM(!).

    That’s not that wild of a design principle: It’s been longstanding best practice to scale OLTP databases (and limit indexing) as to keep secondary indexes buffered in memory.

    • johannes1234321 5 years ago

      To explain that statement: Historically ndb worked in a way that all data had to be in memory and it wouldn't touch the disk at all. For a while now data can be on disk, but all indexes afaik still have to be fully loaded into memory in startup.

      The use cases are systems where you need "five nines" of uptime and fat responses. Coming out of Ericsson the classic area where it is used is in Telco (for instance "home location registers", the database recoding in which cell a mobile phone currently is, often use ndb) but there are different usages in Web (i.e. Session Store), "real time" information exchange (betting, gaming, ...) and so on.

      It is not as easy to administer as a "normal" MySQL, but when deployed carefully it is powerful, fast and scalable (both locally as well as geo l-level)

  • minusf 5 years ago

    postgresql can use ICU and on some platforms it's the only way to get utf8 collations.

  • mywacaday 5 years ago

    I was watching Sherlock Holmes tonight and for some reason I read that in Benedict Cumberbatches voice, most entertaining, thank you!

johannes1234321 5 years ago

One can argue about the statement that PostgreSQL is more feature rich.

MySQL has more replication features, different storage engines, etc. also in MySQL GIS functionality is included and not an external plugin (like PostGIS)

The source tree you looked at probably also has ndb cluster included; if you cloned from GitHub, you also get the MySQL Router and other side components.

MySQL also bundles most external dependencies (excluding for example boost)

Any serious discussion would require deeper analysis.

P.S. I work on the MySQL team, never looked at pgsql source, so can't really judge.

  • abernard1 5 years ago

    > MySQL also bundles most external dependencies (excluding for example boost)

    This is a great callout. Most people running PG for serious production probably need pgbouncer which looks like it adds 35-50K lines. There's probably a few more 50K-ish plugins that are must haves for serious usage.

    I don't know how many lines NDB is, but the lack of a standard failover mechanism and cluster manager in Postgres is a pain. The number of effective lines that the bespoke systems people build for that purpose is probably quite large.

    • 5Qn8mNbc2FNCiVV 5 years ago

      Can you give some recommendations for must have plugins? I have a few Postgres databases running in production and none is more than the default docker image from Docker Hub

  • ksec 5 years ago

    >One can argue about the statement that PostgreSQL is more feature rich.

    I dont even think it should be an argument. Postgre by default, on itself without extension is anything but feature rich, especially when comparing to MySQL.

    I like Postgre, and I know lots of people on HN do too, and I also know these same people absolutely hate oracle. But come on.

    Sometimes these Hype and cult is just a little too much.

  • redis_mlc 5 years ago

    If you're a developer, a database is just a database.

    If you're a corporate DBA, they are each worlds apart.

    MySQL is a perfect match for most SaaS applications that deal with compliance. Postgres is simply not.

    The main issues with managing Postgres today are grant inheritance, and also the lack of COPY FROM/TO grants. I shouldn't have to update grants for non-admin or readonly users after adding new objects, and there needs to be the equivalent of MySQL's FILE grant for Postgres' COPY FROM/TO.

    OTOH, it is easy to reason about MySQL grants, and almost pain-free to update them in production.

    If anybody can send me a link on how to make Postgres work in a corporate and/or compliance environment, let me know.

    Source: DBA.

    • abernard1 5 years ago

      > MySQL is a perfect match for most SaaS applications that deal with compliance. Postgres is simply not.

      I find this statement amazingly weird. In my experience, MySQL's access grants are extremely impoverished compared to Postgres (even in 8). It doesn't have row level security either.

      > The main issues with managing Postgres today are grant inheritance

      roles + `INHERIT` (which has been available since at least 8.1, 15 years ago)

      > I shouldn't have to update grants for non-admin or readonly users after adding new objects

      You don't. There are default grants that apply on object creation within a schema: `ALTER DEFAULT PRIVILEGES`.

      • redis_mlc 5 years ago

        The reality is that I commonly see MySQL production applications use non-admin accounts, and have never seen that for Postgres. So there is a problem.

        Can you see if there is a grant similar to MySQL's FILE for COPY FROM/TO in Postgres? The last time I checked in early 2019, there wasn't.

    • justinjlynn 5 years ago

      sigh

      Must you really draw conclusions about the suitability of PostgreSQL or MySQL for all users based on your limited perspective/niche needs?

      > If you're a developer, a database is just a database.

      [screams internally]

      • quickthrower2 5 years ago

        Yeah I mean as a developer I’ve never had the luxury to ignore storage concerns.

        At bigger companies they might abstract it out but your gonna get a slow query one day and be tuning sql or something.

    • thinkx 5 years ago

      > MySQL is a perfect match for most SaaS applications that deal with compliance. Postgres is simply not.

      Please elaborate on this “simply not”.. you are making a categorical statement.. does there exist a DISA-certified STIG for MySQL, for DoD and military applications?

    • x87678r 5 years ago

      I used to think this. I was surprised when DHH said hey.com uses MqSQL as back end. https://twitter.com/dhh/status/1275901955995385856

rmrfstar 5 years ago

You could probably turn this into an interesting blog post.

+ What subsystems does each project have? How do LOC stack up on a per-subsystem basis?

+ Do they have different dependencies?

+ Does LOC have the same meaning in the two projects, or are there stylistic differences that meaningfully alter line counts?

+ Are statically linked binaries similarly sized?

So, it looks like you have the right question. You just need to do the digging.

  • eloff 5 years ago

    Also if there are large generated files (the SQL parser is generated from a grammar) or if one project is vendoring dependencies it can artificially skew the line count.

    It's not a meaningful measure in itself without digging deeper.

userbinator 5 years ago

Within a single language there's already considerable stylistic variance. I used to work in CS education and marked assignments, and I would see variances in length of students' (correct) solutions, for relatively trivial problems, that spanned a range of several multiples; e.g. what someone would take 50 lines to solve, someone else might need over 300. Part of it is comments and whitespace, but unnecessary abstractions also contribute to making code more "bloaty" than it could be.

thinkx 5 years ago

Bruce Momjian wrote a few blogs about code quality.. the phrase “brutal replacement of old code” comes to mind

capableweb 5 years ago

When groups of people develop software, you get software that represents the group and it's dynamics (Conway's law). I think it's mostly because of the people who wrote the software, and what priorities they had when working on it.

Scarbutt 5 years ago

MySQL has a storage abstraction and includes multiple storages.

  • 67868018 5 years ago

    Postgres has storage abstraction since early 2019

  • castorp 5 years ago

    I think you will always find something that the other product doesn't have (or doesn't have in the same way)

    Postgres e.g. has the infrastructure for extendable data types and index types and the infrastructure for extensions - none of that can be found in MySQL and that infrastructure needs source code as well.

    I don't think the nearly 3-fold difference can't be attributed to "features", but to other things. Maybe the projects are structured differently or include different additional tools in their code (e.g. the input files to generate the SQL parser, rather than the generated parser's C code).

owowow 5 years ago

Likely different strategies for software development and less need to support legacy clients contributed to the line count being much lower.

aristofun 5 years ago

Why did you assume there is strong correlation between LOC and features/reliability in the first place?

mobilemidget 5 years ago

That the source is much larger I don't mind _so much_. But this recent thing mysql is doing worries me more.

ls -lah /usr/sbin/mysqld -rwxr-xr-x 1 root root 1.1G Mar 26 2020 /usr/sbin/mysqld

1.1G binary.

  • bmn__ 5 years ago

    How did that happen? This system's `mysqld` (from mariadb-10.4.14) is 22 MB. Even the biggest binary is only 100 MB.

        stat -c '%s %n' /usr/{bin,sbin}/*|sort -nr|head -20
        100034576 /usr/bin/blender
        63352576 /usr/bin/godot
        61299448 /usr/bin/scummvm
        61004336 /usr/bin/godot-headless
        55123448 /usr/bin/podman
        42794056 /usr/bin/podman-remote
        30582984 /usr/bin/makemkv
        29998400 /usr/bin/godot-runner
        29340272 /usr/bin/cabal
        28086824 /usr/bin/node14
        27641920 /usr/bin/godot-server
        25842880 /usr/bin/restic
        25345200 /usr/bin/krita
        24520736 /usr/bin/lto-dump-10
        22376736 /usr/bin/mariabackup
        22041432 /usr/sbin/mysqld
        17203016 /usr/bin/clementine
        15435056 /usr/bin/z3
        15360752 /usr/bin/doxygen
        14694288 /usr/bin/php

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection