SQL:2023 has been released

iso.org

270 points by MarkusWinand 3 years ago · 136 comments

Reader

MarkusWinandOP 3 years ago

The major news are:

- SQL/PGQ - A Graph Query Language

- JSON improvements (a JSON type, simplified notations)

Peter Eisentraut gives a nice overview here: https://peter.eisentraut.org/blog/2023/04/04/sql-2023-is-fin...

dang 3 years ago

Discussed here:
SQL:2023 is finished: Here is what's new - https://news.ycombinator.com/item?id=35562430 - April 2023 (153 comments)
bokchoi 3 years ago
Thanks! Lots of little neat improvements in there like accessing JSON values using dots and array syntax:
```
    SELECT t.j.foo.bar[2], ... FROM tbl t ...
```
sverhagen 3 years ago

I'm a pretty average SQL user, but I've heard expert consultants say before that they could do many more things with SQL databases that developers like me would have maybe grabbed a different tool for, like a graph database. So this then makes me wonder, once there's even broader adoption through PGQ, is that going to be a killer for niche databases like Neo4j, in favor of, say, Postgres?
- derefr 3 years ago
  
  Graph databases are about as different from RDBMSes storage-wise, as column-stores are from row-stores. It comes down to how you plan to shard data and distribute queries when data doesn't fit on a single node.
  Using a graph DB with many underlying KV-store nodes, you can have a single graph spread over many machines representing e.g. Facebook's social graph, and run a query which "chases around" edges between vertices that live on different nodes, to solve that query, while ensuring that as little of that has to happen as possible — both by rebalancing vertices so that data is sharded at low-connection-degree points in the graph; and by consolidating the steps of queries that occur on the same node into single batch queries, such that the whole thing becomes (close to) a single map/reduce step.
  There's nothing in Postgres that knows how to do that; if you had e.g. a graph stored in a Citus hypertable, and did a recursive CTE over it to do graph search, then you'd get pretty dang bad perf.
  - MarkusWinandOP 3 years ago
    
    > It comes down to how you plan to shard data and distribute queries when data doesn't fit on a single node.
    A problem everbody would love to have but pretty much nobody actually has.
    
    jupp0r 3 years ago
    
    > A problem everbody would love to have
    Except the people who do have it and need to keep their business running off of one postgres instance.
    
    throwaway894345 3 years ago
    
    You can have data that fits on one machine and still run multiple instances of postgres in a failover configuration, which will probably cover just about everyone (depending on your filesystem, disk for a single instance is essentially infinite, so I'm not actually sure what bottleneck would motivate you need to scale beyond this configuration).
    
    jupp0r 3 years ago
    
    > I'm not actually sure what bottleneck would motivate you need to scale beyond this configuration
    It's usually not that data doesn't fit on one machine but that load on the database exceeds what one machine can serve. A failover configuration might enable you to use the spares for some read operations and take a little load of the primary, but you lose ACID semantics when you do that and it generally doesn't help you for long.
    
    threeseed 3 years ago
    
    There is a reason the world moved on from failover architectures.
    a) At some point you will have more data or more users than one instance can handle. And instead of simply adding another node you need to throttle usage in order to do a rolling upgrade. Which is far easier said than done and involves impact to the business.
    b) With distributed databases you are constantly testing that everything works in Dev, Test etc environments. With failover you really only test it every now and again usually before you deploy to Production. And in most companies which are hopeless the testing will be guaranteed to be inadequate.
    c) Vendors lie. They promise that failover will just work but in my experience it very often doesn't. Which is another reason why b) is such important to validate their claims.
    
    throwaway894345 3 years ago
    
    I mean, I don't like that Postgres is not infinitely scalable, but the whole point is that (a) is not generally true--most companies could probably get by with a single machine's worth of data, or rather if they have more than one machine's worth of data, those systems probably aren't talking to each other such that they need to be on the same box. Regarding (b) and (c), do you not need to test failure conditions for distributed databases (this isn't rhetorical, I've only ever used cloud providers offerings)?
    
    pooper 3 years ago
    
    > most companies could probably get by with a single machine's worth of data, or rather if they have more than one machine's worth of data those systems probably aren't talking to each other such that they need to be on the same box
    one of the things I would like to see in my lifetime is somehow it should be easier to "run the whole enterprise" from one box. Sure, it will probably be seriously underpowered and I can't do all things at once but for most small to mid-size companies, it should be possible to run all our "code" from one machine.
    I think of this as some kind of development or pre-qa environment. It really shouldn't be that big of an ask...
    I am thinking most, if not all, companies will be able to fit their entire enterprise on a Supermicro A1+ server with two 96-core processors. Sure, there is no machine in the world that can fit all of YouTube videos but there is no reason why we can't have YouTube, with a limited set of non-production data, running from just one box. Thoughts?
    
    throwaway894345 3 years ago
    
    > one of the things I would like to see in my lifetime is somehow it should be easier to "run the whole enterprise" from one box. Sure, it will probably be seriously underpowered and I can't do all things at once but for most small to mid-size companies, it should be possible to run all our "code" from one machine.
    I think that's the mainframe idea. There's probably some interesting philosophical question in there about whether or not a data center is just a big mainframe. It sort of feels like it verges on semantics.
    > Sure, there is no machine in the world that can fit all of YouTube videos but there is no reason why we can't have YouTube, with a limited set of non-production data, running from just one box. Thoughts?
    I'm not sure what you mean exactly, but if you're streaming YouTube's traffic through one box (even if that box isn't directly connected to the disks on which the video is stored) you'll run into I/O bottlenecks--such a machine would need to push terrabytes per second which is probably not trivial. Moreover, having a single machine that can handle YouTube's peak traffic probably means you're underutilizing it most of the time.
    
    pooper 3 years ago
    
    Sorry I should have been clearer. I didn't mean to replace production. I meant like somewhere I can change maybe a few lines of code and run the entire application end to end.
    Basically, somewhere I can run something and feel safe knowing this is production code except this one change I have made.
    
    jupp0r 3 years ago
    
    You have a misconception about the workload of your typical database server. It's not about the amount of data it's storing, it's about
    1. compute and memory bandwidth to serve complicated queries 2. IO
    You can't scale memory bandwidth beyond some pretty low limit on one machine. You can't scale IO bandwidth beyond some limit. To give you an example, I've seen database servers with 20GB of data being so overloaded by compute requirements of complex queries that they needed to be scaled horizontally.
    
    pooper 3 years ago
    
    But like this server has two 96 core processors. Moreover, we are talking about development environment for a single developer and maybe one or two users to try out their changes. It should be good enough, no?
    I mean I expect things to be slower I guess but to test for correctness and spec?
    
    justinclift 3 years ago
    
    Interestingly, when a place does get to the point where the single instance has capacity issues (after upgrading to EPYC and lots of flash drives) then other non-obvious stuff shows up too.
    For example, at one place just over a year ago they were well into this territory. One of weird problems for them was with pgBadger's memory usage (https://github.com/darold/pgbadger). That's written in perl, which doesn't seem to do garbage collection well. So even on a reporting node with a few hundred GB's of ram, it could take more than 24 hours to do a "monthly" reporting run to analyse PG usage for the time period.
    There wasn't a solution in place at the time I left, so they're probably still having the issue... ;)
    
    threeseed 3 years ago
    
    I consult for a lot of companies and I never heard of or seen a database that wasn't horizontally scaled.
    It's not for scalability reasons it's for high-availability.
    Which as cloud adoption has increased and server uptime has decreased is even more important.
    
    Spooky23 3 years ago
    
    Some of these arguments and “common knowledge” things are getting old. Everybody scaled up twenty year ago - hell Amazon used to brag that they used an HP Superdome or whatever.
    Anyone with dogmatic opinions about this stuff need to be taken with a grain of salt. If you scale out PeopleSoft, your accounting system will exceed the value of your company. If you’re worried about webscaling your random app, that’s more wasting time navel gazing than accomplishing anything! :)
    
    jahewson 3 years ago
    
    Why shard when you can just replicate?
    
    lmm 3 years ago
    
    Because replica failover is rarely seamless (and often doesn't actually work at all, IME).
    
    mikepurvis 3 years ago
    
    Instinctively that's surprising... replica failover should be far simpler technically, shouldn't it?
    
    lmm 3 years ago
    
    No? Replication tends to be a bodged-on mess throughout, full of undertested edge cases, of which failover is definitely one. If you build the system so that nodes joining and leaving is a natural and normal part of operation, well, it naturally works a lot better.
  - WorldMaker 3 years ago
    
    The PGQ part of SQL is designed in such a way that an RDBMS can certainly optimize its storage of graph tables (eventually). ("Graph tables" used by PGQ have DDL setup, per the Eisentraut overview: CREATE PROPERTY GRAPH graphname…) If Postgres embraces PGQ it would be easy to imagine they would also adopt some graph database style storage to back its indexing and query planners.
  - zozbot234 3 years ago
    
    > There's nothing in Postgres that knows how to do that
    Postgres uses foreign data wrappers and table partitions to achieve sharding of relational databases over many nodes. It's certainly possible to make the FDW layer smarter wrt. being able to distribute certain queries to the shards, including some recursive queries that happen to be of practical interest.
  - dathinab 3 years ago
    
    if you are Facebook scale, yes. But then you anyway will use a dedicated graph database.
    But a lot of graph databases have sizes where they are fully replicated on each node and some small internal optimizations for the graph are good enough. Given the design of the graph query language (and what you can query, hint: not everything arbitrarily) there is no reason postgres can't gain some additional optimizations to do that effectively as long as it's not facebook scale.
    SQL now having syntax for querying graph means Graph databases can support SQL, at least a subset of it, in a standardized way.
- quantified 3 years ago
  
  A big problem (IMHO) with graph databases is building "the" graph model, and the fact that it's easy to be faced with problems that don't suit a graph database. Something as simple as returning the distinct set of values for an attribute and count of vertices containing each value require going outside the graph model, so aren't composable very well in a property graph system. (There are other graphs besides property graphs, they will have their time someday.)
  What you really want is to apply graph processing to data as it is. The SQL 2023 additions are a step in the right direction. I need to find a good detailed description of the constraints and semantics to assess how good it is.
- pphysch 3 years ago
  
  "Kill" is a strong word, as Postgres's solid JSON support technically obsoleted MongoDB for most use cases, but Mongo is still around for various reasons.
  I suspect if Postgres had a solid implementation of SQL/PCQ it would be a similar story for Neo4j.
- czx4f4bd 3 years ago
  
  I wonder if there's been any observable correlation between JSON support in the major SQL databases and the decreased (or increased?) adoption of NoSQL document databases like MongoDB. It would be interesting to do some bulk analysis on GitHub commits to compare their use over time.
  - hn_throwaway_99 3 years ago
    
    Just one bit of personal experience, but for me it was a significant reason. In most cases you want objects to have highly structured data (e.g. for joins and queries) and in other cases you just want "a bunch of semi-structured stuff". Sure, DBs always had blobs and text, but JSON is really what you want a lot of the time.
    There's also a good article by Martin Fowler about how "NoSQL" was really "NoDBA" for a lot of folks, and I definitely saw that dynamic. JSON fields can also be a good middle ground here, where a DBA can insure good "structural integrity" of your schema, but you don't need to go through the hassle of adding a new column and schema update if you're just adding some "trivial" bit of data.
    
    tracker1 3 years ago
    
    The canonical example for me, is when you want to store/use additional payment processor details for a transaction... If it's direct CC, PayPal, Amazon Payments etc. Relationally you only really care that the amount of the transaction was sent/received/accepted. But you may want to store the additional details, without a series of specific tables per payment processor. If you need to see the extra details that can still be done at runtime.
    Another good example is for generalized classified ads, different categories may have additional details, but you don't necessarily want to create the plethora of tables to store said additional details.
    
    throwaway894345 3 years ago
    
    Honestly, I pretty much always want structure. The reasons I've opted for NoSQL are almost always that cloud providers offer it for practically free while managed SQL databases are wayyyy more expensive. The nice thing about JSON is that it's a lot more ergonomic, but not because of the lack of typing--I would absolutely use a database that let my write reasonable type constraints for JSON columns. (I realize that you're talking about why most people use NoSQL and I'm remarking about why I use NoSQL).
    Some other controversial thoughts: SQL itself is a really not-ergonomical query language, and also the lack of any decent Rust-like enum typing is really unfortunate. I know lots of people think that databases aren't for typing, but (1) clearly SQL aspires toward that but gives up half way and (2) that's a shame because they have a lot of potential in that capacity. Also while you can sort of hack together something like sum types / Rust enums, it's a lot of work to do it reasonably well and even then there are gaps.
    
    hn_throwaway_99 3 years ago
    
    Not sure I understand what you mean, or rather that all of this appears to be available in postgres.
    pg_jsonschema is a postgres extension that implements schema validation for JSON columns. I'm not particularly familiar with Rust, so not sure exactly what you mean by "Rust-like enum typing", but postgres has enums, composite types, array types, and custom scalars, so not sure what's missing.
    
    throwaway894345 3 years ago
    
    By "Rust-like enums", I mean "sum types" or "algebraic data types". In general, it's a way of saying that a piece of data can have one of several different types/shapes (whereas a Postgres enum is basically just a label backed by an int). But yeah, with jsonschema you can probably express sum types, but jsonschema is disappointing for a bunch of reasons and needing an extension is also not great.
    
    munk-a 3 years ago
    
    Every ecosystem I've ever worked in has had good tooling for managing DB migrations (and in some cases I've been the one to add it). It's trivial to write a migration to ALTER TABLE bar ADD COLUMN foo and I think keeping structure explicit is generally quite beneficial for data safety even if you're not doing fancy things. DBAs are great but most companies simply don't need one - you can just get by with some pretty rudimentary SQL and skill up as needed.
    Assuming you've got good integration test coverage of the database schema alterations end up taking a minuscule amount of time and if you lack test coverage than please reconsider and add more tests.
    
    hn_throwaway_99 3 years ago
    
    Completely disagree. The issue is not about really about how hard or easy it is to run migrations (every project I've worked on has also used migration files), it's that, depending on the data, it can just be a total waste of time.
    Sibling comment, "is when you want to store/use additional payment processor details for a transaction", is a great example IMO. I've dealt with card processing systems where the card transaction data can be reams of JSON. Now, to be clear, there are a lot of subfields here that are important that I do pull out as columns, but a lot of them are just extra custom metadata specific to the card network. When I'm syncing data from another API, it's awesome that I can just dump the whole JSON blob in a single field, and then pull out the columns that I need. Even more importantly, by sticking the API object blob in a single field, unchanged, it guarantees that I have the full set of data from the API. If I only had individual columns, I'd be losing that audit trail of the API results, and if, for example, the processor added some fields later, I wouldn't be able to store them without updating my DB, too.
    Before JSON columns were really standard, saw lots of cases where people would pull down external APIs into something like mongo, then sync that to a relational DB. Tons of overhead for a worse solution where instead I can just keep the source JSON blob right next to my structured data in postgres.
    
    tracker1 3 years ago
    
    I think when you really need/want a DBA is when you're at a point where either you need redundancy/scale or have to remain up. Most developers aren't going to become that familiar with the details of maintenance and scale for any number of different database platforms. I think MS-SQL does better than most at enabling the developer+dba role, but even then there's a lot of relatively specialized knowledge. More so with the likes of Oracle or DB2.
    
    psidebot 3 years ago
    
    Of course, if you're using Oracle or DB2 you have other/bigger problems.
  - threeseed 3 years ago
    
    MongoDB remains the 5th most popular database: https://db-engines.com/en/ranking
    And there are four major reasons still to choose MongoDB over something like PostgreSQL.
    a) PostgreSQL has terrible support for horizontal scalability. Nothing is built-in, proven or supported.
    b) MongoDB has superior ability to manipulate and query the JSON.
    c) MongoDB is significantly faster for document-attribute updates.
    d) MongoDB has better tooling for those of us that prefer to manage our schema in the application layer.
    
    paulddraper 3 years ago
    
    By the time you need to shard PostgreSQL (billions of records?), you have lots and lots of resources to overcome that difficulty, a la Notion.
    
    lmm 3 years ago
    
    If you want to be high-availability then you need sharding or something like it from day 1. There's still no first-class way of running PostgreSQL that doesn't give you at least a noticeable write outage from a single-machine failure.
    
    hans_castorp 3 years ago
    
    > If you want to be high-availability then you need sharding or something like it from day 1
    Sharding has nothing to do with high-availability.
    
    threeseed 3 years ago
    
    You horizontally scale for high availability as well as scalability.
    And primary-secondary failover in my experience is rarely without issues.
    There is a reason almost every new database aims to be distributed from the beginning.
    
    dikei 3 years ago
    
    >> There is a reason almost every new database aims to be distributed from the beginning.
    That's partly because you can't compete with the existing RDBMSs if you're single node: they are good enough already. Nobody will buy your database if you don't introduce something more novel than PostgreSQL, whether that novelty is worth it or not.
    
    paulddraper 3 years ago
    
    Primary-secondary is simple and robust. If I had a dollar for every time I saw split-brain clusters....
    ---
    And to respond sibling comment about "noticeable" downtime....
    Primary-secondary failover in <1m is very feasible. And each minute downtime is a mere 0.002% for the month.
    Primary-secondary isn't what is hurting your availability.
    
    threeseed 3 years ago
    
    The experience for at least some of us is that failover is not robust. At all. And that < 1m is best case scenario that still requires a person to be monitoring the process.
    And given that the entire industry has moved to a distributed model despite its complexity gives you a hint as to which way the wind has been blowing for the last decade.
    
    ibotty 3 years ago
    
    You don't need to be that arrogant. The number-one reason why there are no new (No)SQL-Databases for one node is that the existing databases are great and you can't monetize them.
    Failover is automatic for PG when using e.g. Patroni. Of course you lose active transactions and that might be a showstopper, but monitoring failover? I am curious when you'll have to do that.
    
    throwaway2990 3 years ago
    
    a) not true b) not true c) not true d) not true e) a lot of people have no idea json support exists in PostgreSQL.
    
    olavgg 3 years ago
    
    Agreed, when you see the index size in Mongo vs PostgreSQL, you will quickly understand that a single PostgreSQL instance can outscale a huge Mongo cluster.
    
    pjmlp 3 years ago
    
    PostgreSQL isn't the only RDMS to chose from.
  - sverhagen 3 years ago
    
    You would have to tell the decreased adoption of NoSQL due to JSON support in major SQL databases apart then from the decreased adoption of NoSQL due to the hype being over...
pphysch 3 years ago

See also [1] for how this (might) relate to PostgreSQL
In particular it is nice to see that a core dev views JSON dot accessing and PCQ as "sensible" future additions to Postgres.
[1] - https://peter.eisentraut.org/blog/2023/04/18/postgresql-and-...
wslh 3 years ago

Basic question: is it correct to assume that having PGQ involves a big change in the database engine?
- pphysch 3 years ago
  
  AFAICT the idea is that you are not directly querying the tables as a graph, but you construct a graph "view" from the tables, and then query that graph using PCQ.
  - zubiaur 3 years ago
    
    Correct. The one RD I know that has implemented this "graph view" concept is Oracle's. They did it first with PGQL extensions on top of say 19c. Interesting, but the execution seemed a bit unpolished.
    Now with 23c, they are adopting PGQ and one should be able to interface through the regular JDBC connectors. Will see how it shapes up.
- zozbot234 3 years ago
  
  It's just a different language and a simple "property" layer over the existing data. No changes to the internals are necessary.
  - WorldMaker 3 years ago
    
    No changes are required by PGQ but some engines may wind up using PGQ as a signal to bring in new types of indexes or disk storage formats to back it.
    It may be interesting to see if real world usage of PGQ pushes the database engines to do interesting things.
thanatos519 3 years ago

PGQ looks neat - create a "property graph" from a relational model, then query it via Cypher-like expressions. The best or the worst of both worlds, depending on implementation quality.
justinclift 3 years ago

As a thought, it might be better to use the https:// link to Peter's overview. :)
- MarkusWinandOP 3 years ago
  
  Fixed. I wonder why Google sent me to http...
mariuz 3 years ago

And here is the article on the status of SQL:2023 support in PostgreSQL https://peter.eisentraut.org/blog/2023/04/18/postgresql-and-...
Zpalmtree 3 years ago

I like the DISTINCT / NOT DISTINCT unique NULL option, I was wanting this feature just a few weeks ago
- MarkusWinandOP 3 years ago
  
  That particular one is already available in PostgreSQL 15.
  https://modern-sql.com/caniuse/unique-nulls-not-distinct
bionhoward 3 years ago

Which SQL DBs support these features now? Who is almost there? I’m definitely excited to try it!
ksec 3 years ago

I wonder when or even if MySQL will adopt any of these.

jchw 3 years ago

One thing that has always agitated me about SQL is that although it's standardized, and the standard seems to encompass a shit-ton, in practice a lot of SQL engines don't really seem to have any meaningful interoperability for practical uses among the world's most popular database engines.

For example, OK, I realize auto-incrementing IDs are not the most important thing in the world, and arguably not even a good approach in many cases. But sometimes you want them, and helpfully almost every database engine I know of has some kind of support for this, even if the semantics may differ. It's a super basic thing to want a unique ID that roughly counts upward on a table. You might have specific needs about re-using numbers and whatnot, but the general idea is very simple.

However: in practice, there is not an excellent way to do it that I can see. The closest thing I could find is `GENERATED BY DEFAULT AS IDENTITY` which, well, works. However, none of SQLite3, MSSQL, nor MariaDB support this to my knowledge.

This is relentlessly annoying.

Is it the standards fault, or the implementations? I honestly can't say. However, I definitely find this annoying, since I was really hoping that by this time, we'd at least have a nice clean subset of standard SQL you could count on anywhere, for popular database engines. Unfortunately, it's not quite there yet, necessitating ugly hacks to this day.

I assume this new standard doesn't really change anything on this regard, since it's a desync with implementations that is a problem, and it does not seem the standards committee really cares too much about this kind of thing. (I could be wrong, though, as I am saying this based on feel and not evidence.)

mdaniel 3 years ago

> Is it the standards fault, or the implementations?
My mental model is that it's a mixture, but my life experience has been that a "standard" without a test harness or (at bare minimum) a reference implementation is just a bunch of navel gazing. For SQL specifically, that problem is even worse given the number of existing engines that move faster than the specification, so in the absence of leadership they just make stuff up
Natural language is also a catastrophically horrible specification mechanism, since your black/blue is my white/gold
setr 3 years ago

The way I see it is the fundamental problem is that SQL is not a proper composable language, and the standard never defines such a thing. So every feature added to an RDBMS is done so as an extension to the language, rather than an update to a standard library like any sane modern programming language. SQL as a language still operates with the mentality of COBOL -- if anything reusable is going to be provided, it will be provided by the RDBMS manufacturer. The user is only expected to produce highly business-specific logic for their own needs. (RDBMS's do often offer standard PL language support, but these typically hook into database internals, are intended for highly specific scenarios, cannot be transferred to any other RDBMS, and are generally meant for writing stored procs -- non-reusable functions)
As a result, for databases to compete on features, they must arbitrarily extend the SQL language standard; these modifications to the language then get backfilled into the standard, and runs headfirst into backwards compatibility, and suddenly no one agrees on really what the feature should precisely be so they it becomes an optional part of the standard, which really just means that it isn't standardized.
In any sane language, you wouldn't need different databases to add specific support for GENERATED BY DEFAULT AS IDENTITY, and especially not for 15 different syntax's used in 15 different databases to specify kinda sorta not really the same thing -- it would simply be a function, one you could write yourself, or provided by the standard library. It wouldn't be up to the RDBMS to offer support beyond actual language features -- it'd just be up to you to update your libraries.
- galaxyLogic 3 years ago
  
  > SQL is not a proper composable language,
  Very good point. Non-composable mean non-scalable (language).
  I wonder if it would be possible to define a very small robust standard subset of SQL with a standard for extending it. But that would take away the customer-lock-in which is what all big DB-vendors desire.
dmux 3 years ago

SQLite does support auto-incrementing: https://www.sqlite.org/autoinc.html. Is your gripe with the lack of "GENERATED BY DEFAULT AS IDENTITY" syntax specifically?
- jchw 3 years ago
  
  Yes, sorry. My gripe is that we don't have a single general way to say "I don't really care about the specifics, I just want a number that goes roughly upward" for all database engines. SQLite is interesting in that it supports at least two distinct sets of semantics for auto-incrementing rows, but neither of them are the standard syntax...
- richardwhiuk 3 years ago
  
  The gripe is that there's not a standard say of defining this in SQL.
  SQLite SQL != MySQL SQL != ISO SQL

justinclift 3 years ago

Ugh, it's CHF 208.00 (about US$230.00).

---

As @rgbgraph points out below, the price is actually several times that. There are several parts to the standard, and that US$230 is per part.

MarkusWinandOP 3 years ago

Luckily pretty much nobody needs the standard documents. It's actually my aim at https://modern-sql.com/ to make the relevant information more accessible — in particular including support-matrices ("Can I Use").
- chrismorgan 3 years ago
  
  I’ve never had access to the SQL standard, but in things like HTML and CSS I know I reach for the specs extremely regularly, and that when working with SQL (mostly PostgreSQL or SQLite, including sometimes having to support both) I get frustrated at the general poor state of SQL documentation (and certainly a lot of this is because of engine diversity and), and have often wished I had access to the SQL Standard (even though I know engine diversity significantly lowers its value for users of particular databases—my ideal would probably be a version of the spec augmented with engine support and links to each engine’s documentation of the matter). Certainly the table of contents for this spec sounds delightful.
  Not all specs are particularly accessible (e.g. ECMAScript is often hard to follow if you haven’t spent a fair bit of time around it—it’s mostly natural-language code that works with a ECMAScript virtual machine), but most of the time, I would much rather have the actual spec over someone’s digested summary that covers what they thought was important, but regularly lacks details important for my situation. Some specs are absolutely magnificent. The HTML Standard is my very favourite as both a piece of spec work and as reference material.
  Seriously, specs are really good stuff and it makes me sad how people often ignore them because they assume they’ll be inscrutable. (Similar remarks apply to legislative texts. They’re normally pretty easy to understand, and you find all kinds of sometimes fun and sometimes useful gems that potted summaries exclude.)
  - MarkusWinandOP 3 years ago
    
    I'm studying the SQL standard for years now and compared to other standards that I know (XSLT, a little CSS, decades ago POSIX, C and C++) the SQL standard is really hard to make sense of. You might overestimate the value of having access to it.
    Having that said: free would be better.
    
    momirlan 3 years ago
    
    agreeing here Markus, for all practical purposes all one needs to know are the features supported by a particular product, not the possible features. big fan of your site !
    
    nurettin 3 years ago
    
    I have not read any of 9075, but 14882 is surprisingly comprehensive, and wg21 regularly publishes prerelease versions for free.
  - paulddraper 3 years ago
    
    The good news (not really) is that the SQL standard isn't really followed.
    So unless you are writing a database yourself, the DMBS documentation is going to be more relevant.
- patrec 3 years ago
  
  Just in case anyone who isn't familiar with Markus's work gets a slightly w3schools vibe when clicking the above link: his book SQL Performance explained[1] is probably the single most useful book a backend developer can read.
  [1] https://sql-performance-explained.com/
- rekabis 3 years ago
  
  As a web developer: your site is clean and crisp, with an almost brutal simplicity which makes it very attractive. Honestly, I like it a lot. It could use one or two very subtle tweaks in the super-fine details, but for a technical information-dissemination site, it’s bloody awesome.
  My only issue is with the width. While whitespace between the sides and the centre content is very useful, this isn’t the 1990s anymore with its 1024×768 monitors. You _can_ make the centre column responsive to the overall width of the screen. Doing so can also give you a lot more room to do things, and make it easier to read. Your line-height is already great, and is perfect for text blocks a good half again wider if not twice as wide. Even on my vertical monitors, which are only 1500px wide, that centre column is pretty much claustrophobically narrow.
  - MarkusWinandOP 3 years ago
    
    The rationale behind the width is more the ease of reading than anything else: https://en.wikipedia.org/wiki/Line_length
    But thanks for the feedback. There is a rework pending anyway.
  - munk-a 3 years ago
    
    One piece of good news is that you just need to drop #page-wrapper's width (or adjust to preference) the rest of the CSS all appears to be sensibly responsive.
- fhoffa 3 years ago
  
  I love this, thanks!
  How we could we add Snowflake to the compatibility matrices?
  It's popular enough to deserve it, IMHO.
  https://db-engines.com/en/ranking
- LispSporks22 3 years ago
  
  Cool nice site man. I bought the book a while back and really enjoy the articles.
- sosodev 3 years ago
  
  I've certainly needed them before :(
rgbgraph 3 years ago

For one part. IIRC, there's 11 "new" parts this year.
Which is even more hilarious, considering the ISO is already being funded by the tax dollars of member countries.
A bit like having to pay a journal to get access to research papers: ridiculous.
- justinclift 3 years ago
  
  Oh, wow. Yeah, that's even worse.
  Trying to throw together a list of them all, I'm only finding 5 though:
  • https://www.iso.org/standard/76583.html
  • https://www.iso.org/standard/76584.html
  • https://www.iso.org/standard/76585.html
  • https://www.iso.org/standard/76586.html
  • https://www.iso.org/standard/76587.html
  Any idea about the others?
  - rgbgraph 3 years ago
    
    • Part 1: https://www.iso.org/standard/76583.html
    • Part 2: https://www.iso.org/standard/76584.html
    • Part 3: https://www.iso.org/standard/84803.html
    • Part 4: https://www.iso.org/standard/76585.html
    • Part 9: https://www.iso.org/standard/84804.html
    • Part 10: https://www.iso.org/standard/84805.html
    • Part 11: https://www.iso.org/standard/76586.html
    • Part 13: https://www.iso.org/standard/84806.html
    • Part 14: https://www.iso.org/standard/76587.html
    • Part 15: https://www.iso.org/standard/84807.html
    • Part 16: https://www.iso.org/standard/79473.html
    
    justinclift 3 years ago
    
    Thanks. The pricing for this stuff seems to be truly crazy.
    Wonder if Elsevier is involved somewhere? It's how they do things too.
nologic01 3 years ago

they say the Standard contributes to SDG 9...
"Inclusive and sustainable industrialization, together with innovation and infrastructure, can unleash dynamic and competitive economic forces that generate employment and income"
Not clear how an arbitrary levy as this can contribute to an SDG goal. It is the definition of exclusion. A bright new thing can use the $230 to get several good database books.
Surely ISO has enough corporate stakeholders that can defray the cost of running the standardization process.
munk-a 3 years ago

I have always despised them for putting a paywall in front of the SQL standard - it's been there since at least 2010 and probably quite a bit earlier. The good news is that nobody actually needs to read it since you need DB specific information anyways... but it'd be nice to have access to it (especially when I was working on RDBMS neutral dialect to support multiple backends). Still, if you ever have a question about documentation postgres[1] does an absolutely amazing job detailing everything in its dialect and their dialect tends to be the most "neutral" imo.
1. https://www.postgresql.org/docs/current/index.html

awestroke 3 years ago

If only they could start allowing queries to begin with "FROM tbl". It would allow for for much more helpful autocomplete. Also, DELETE or SELECT should really be on the very last line of the query. Seems like these changes could be done without losing backwards compat

calvinmorrison 3 years ago

As in. If only sql was actually writable or intuitive you'd want to use it more and instead I just reach for wrappers 99% of the time where I can chain all the operations I want together and let Eloquent figure it out

aerzen 3 years ago

Where would one find a pirated mirror of this standard? Or the 2019 one?

Asking for a friend, of course.

jeppebemad 3 years ago

Or said in a more 2023-chatgpt-jailbreaky kind of way: what urls to avoid in order to not find pirated mirrors?

cpdean 3 years ago

I like SQL and all but I really don't care to follow ISO releases. They're hundreds of dollars and nobody actually implements the whole thing. I get way more excited about database releases.

Does anyone else find value in what's in an ISO standard?

lolinder 3 years ago

> They're hundreds of dollars
This isn't SQL-specific, but this is 100% the problem for me. There's such a big culture gap between the way that we do things in most of the tech world and ISO, and one of the biggest clashes is this weird $180 PDF thing.
If I want to implement a new standards-compliant HTML parser, I can hop right onto whatwg.org and view the complete standard instantly [0]. It's massive and complicated, but it's freely accessible to anyone interested.
In contrast, if I want to implement an ISO 8601-compliant date parser, ISO wants me to buy their PDF for CHF166 (~$180 USD). This spec is for a standard that is orders of magnitude less complex, and they're charging through the nose for it.
I'm unclear what makes the difference between a standard that can be maintained by a community for the benefit of everyone and a standard that needs to be locked behind a paywall.
[0] https://html.spec.whatwg.org/
- chillfox 3 years ago
  
  A pay-walled standard is not available, and an unavailable standard is not a standard at all.
  The only real way of fixing it is for enough people to ignore ISO so they become irrelevant.
  If you are building a new DB engine (toy or not), don't use SQL. Either design a new spec or use something that's more openly specified (maybe GraphQL or EdgeQL).

bafe 3 years ago

All great features, but unfortunately most SQL DBs still miss the implementation of features from SQL:2016 like MATCH_RECOGNIZE. I wonder what's the purpose of an ever growing standard when most implementations only support a small subset of it, and often with nonstandard syntax and semantics

hashhar 3 years ago

https://trino.io/docs/current/sql/match-recognize.html
- bafe 3 years ago
  
  Great thanks, if I understand it right, instead of having a new database engine, Trino compiles the statement into the query languages of the different backends and runs these queries in a distributed way?
  - hashhar 3 years ago
    
    Exactly. It provides an API using which you can build connectors to whatever systems you want. e.g. Here's a connector for GitHub API https://github.com/nineinchnick/trino-rest/tree/master/trino....
    It doesn't have to be SQL based systems on the other end - the most used connector with Trino is to query files on object storage (S3/GCS/Azure Blob).
    Disclaimer: I'm one of the maintainers of the project.
    
    bafe 3 years ago
    
    That's great! How about performance? Say I want to use MATCH_RECOGNIZE on a postgresql backend, is the compiled query going to be significantly less performant as compared as the same query on databases that support (oracle) that statement natively?
    
    hashhar 3 years ago
    
    For complex things like MATCH_RECOGNIZE (and CASTs) whose syntax and semantics differ across underlying systems unfortunately the result will be that some data is going to be pulled and be processed in Trino - so it'll be slower than native. If you are only dealing with a single data source (unless it's not an RDBMS, say files on S3 or an API) I'd say Trino is not needed and would slow you down.
    The rule of thumb is that Trino aims to provide a uniform layer over whatever sits underneath. So operations which when "pushded down" to the source result in same results as when executed within Trino do get "pushed down" - i.e. executed on the source. But in cases where the results might differ or it's complex to push-down the operation the operation runs within Trino - i.e. pull data from source (minimum needed data) and then perform operation within Trino.
    Note that it's not an all-or-nothing case, e.g. a query like:
    SELECT n.name, r.name FROM postgresql.tpch.nation n LEFT JOIN postgresql.tpch.region r ON r.regionkey = n.regionkey WHERE n.name > 'A' AND r.name = 'ASIA'
    will result in the following query to Postgres:
    SELECT n.name, r.name FROM tpch.nation n LEFT JOIN tpch.region r ON r.regionkey = n.regionkey WHERE r.name = 'ASIA'
    The rest of the query (n.name > 'A') would be applied in Trino to results fetched from Postgres because the collation in Postgres will affect results if we push complete query down to Postgres and may not match results when entire query would be processed by Trino. With single data source this is not easily appreciated but e.g. if the second table in the query came from SQL Server then you'd want to have a consistent comparision logic regardless of source of table.
    This is a very simplified example though.
    
    bafe 3 years ago
    
    Thanks for the clarification! I'll read more about it, I'm quite curious to see how these clauses are implemented in Trino
    
    hashhar 3 years ago
    
    This is the initial PR which implemented support for this - https://github.com/trinodb/trino/pull/6550

minroot 3 years ago

What's the point of a standard if it takes money to read?

blacklion 3 years ago

What do you think about all MPEG series and WiFi? You need to pay to read, you need to pay to implement, even if it is clean-room implementation.
Ridiculous.

gigatexal 3 years ago

It really is utter bullshit that we have to buy these standards. What are the business models of these standards bodies anyway?

skissane 3 years ago

Other (computing industry) standards bodies make their standards freely available–e.g. ECMA, W3C, OMG, The Open Group, IETF, etc. Given they can't make money from selling the standards documents, they need to find other funding sources. Many of them rely on corporate patrons who fund the standards process, and in exchange often get greater input into moulding the standard to meet their interests. The Open Group makes some of its money by selling trademark licenses and proprietary test suites.
ISO defends its own model by arguing that it produces greater independence from vendors, being less reliant on them for funding and thus making it easier for them to say "no" to them. One big difference is ISO is not an industry-specific standards body, it has standards for all kinds of things that have nothing to do with computers – screw threads, metallurgy, analytical chemistry, oil and gas pipelines, you name it. Possibly their defence makes more sense for some of those other industries than it does for ours.
If the major SQL players got together (major proprietary vendors and leading open source projects), they could create their own SQL standards process to supersede the ISO one, and release the standard freely. The ISO standard could still exist, but it could turn into one of these ISO standards where ISO just adopts the text of an existing standards organisation - e.g. the ECMAScript standard is developed by ECMA. Initially ISO republished ECMA's standards under their own number (ISO/IEC 16262), now instead they publish a 3 page standard which just incorporates ECMA's by reference (ISO/IEC 22275). They offer it for free download [0], but are also willing to sell you a copy if you are desperate to give them money. ISO's SQL standard could turn into the same thing – but, that would require the SQL community to decide to push for that, and I'm not sure any of the players feel sufficiently motivated to do it.
[0] https://standards.iso.org/ittf/PubliclyAvailableStandards/in...
- galaxyLogic 3 years ago
  
  As noted by a post above SQL is not a composable language. That means new features can not be implemented as (standard and user-extendible) libraries. And that makes the standard more bloated with every new feature. The DB-vendors benefit from the situation because it gives them more customer lock-in because when the standard is bloated, not everybody is likely to be fully compliant with it, which causes customer-lock-in.
  Somewhat ironically the standards-body also benefits from such bloat because that is the product they are selling. The bigger the standard the more money they can ask for it.
massysett 3 years ago

I’m astounded at the outrage at having to buy them. Producing standards takes time and labor. Editing them takes time and labor. So does posting them and maintaining them. So it’s no surprise that one way to pay for that time and labor is to charge money. I doubt anyone here complaining about buying standards works for free. That other standards-producers post their standards for free does not mean that charging for them is automatically an outrageous business model.
Finally, for the folks who need professional access to this document, such as RDBMS implementors or professional developers using an RDBMS, the price is a pittance. To anyone for whom the price is a problem - perhaps someone writing a free software application - the lack of access to the standard isn’t a problem, because what they really need is documentation on how their implementations work, and two superb implementations - PostgreSQL and SQLite - have voluminous documentation and they’re 100% free of charge.
- scrollaway 3 years ago
  
  I don’t think anyone is outraged at the idea that the standard authors should be remunerated.
  The problem is that the standards have to be bought. They should be in the public domain. But someone can still be paying for it (eg a government, or an international organization funded by various public bodies).
  - gigatexal 3 years ago
    
    This 100% sums up my point. If it's a standard then it should be freely available. Standards bodies or governments or whoever can pay for the work that goes into it but how is anyone supposed to study and learn from a standard and maybe even not come up with a new one when one already exists if everything is behind a paywall?
    edit: was referring to scrollaway's comment
nologic01 3 years ago

There is a serious cottage industry around standards (not just ISO). Selling the standards themselves is part of an overall architecture for selling "certification" of compliance with the standards and ancillary consulting and training services and, in some instances, selling standard-compliant solutions.
While obviously standards don't have virgin birth (some costs are involved) it does smack as rent extraction: By definition there is only one standard so you can't ask for competitive pressures to lower the price.
jacobsenscott 3 years ago

Selling the standards documents.

tofflos 3 years ago

Seems you can play with SQL/PGQ at https://blogs.oracle.com/database/post/get-started-with-prop....

nologic01 3 years ago

Is Oracle's PGQL (e.g. 2.0) more or less the same as SQL/PGQ?
It might be interesting to have a comparison of where major databases stand (or plan to be) with respect to SQL/PGQ

la_fayette 3 years ago

PGQs match syntax seems interesting and reminds me to writing sparql. I wonder if any RDBMS will support this?

gatvol 3 years ago

Not a standard if access requires payment.

blacklion 3 years ago

What do you think about all MPEG series and WiFi? You need to pay to read, you need to pay to implement, even if it is clean-room implementation.
Ridiculous.

galaxyLogic 3 years ago

The irony of for-profit (= for pay) standard is this: If someone provides a product for a price, then others should be able to produce a similar but better product for similar or better price. In the case of SQL that would mean there should be alternative standards provided by different vendors. But then if there are multiple standards, it can hardly be called a "standard".

I think tax-payer money should pay for standards, because they benefit us all. It is like the highway system, or clean air, and water.

xucheng 3 years ago

A related question: what is the state in term of supporting the SQL standard among the popular RDBMS? It seems that almost all database engines use their own custom syntax.

jsmith45 3 years ago

I can say that none of Oracle, Sybase, or Microsoft Sql Server really aim at conforming to the standard. While they will often try to use standard syntax for new features if such syntax exists, there is tons of old non-conforming syntax that there seems to be no real effort in addressing, even by adding new options, etc. Some of these mean really common features deviate significantly from what the standard requires.
PostgreSQL does mostly aim at conforming to the standard. They will invent new syntax when needed, But compared to the those previously mentioned, Postgres seems to prefer to stick closer to the standard whenever possible, including adding standard syntax for existing features whenever possible.
PostgreSQL does have some places where there is deliberate non-conformance (beyond just incompletely implemented features). They document many deliberate deviations (other than unimplemented or partially implemented features) and if they think they can be fixed in the future or not: https://wiki.postgresql.org/wiki/PostgreSQL_vs_SQL_Standard . Looking at the list I'd say only one especially likely to bite a developer is the default escape character for LIKE clauses, or the non-standard trailing space behavior for character(n) datatypes (but who used fixed length character datatypes instead of varchar or text?). And obviously not yet implemented features could bite people, but many such features are optional to implement anyway, so...
I cannot speak about MySQL or MariaDB, due to insufficient familiarity.
MarkusWinandOP 3 years ago

This is one of the questions I try to answer at https://modern-sql.com/
- bafe 3 years ago
  
  Your website is great and I regularly check it to see what's new in various implementations. Unfortunately it seems that many databases don't support many modern SQL features yet. Any ideas as to why?
  - mdaniel 3 years ago
    
    > Unfortunately it seems that many databases don't support many modern SQL features yet. Any ideas as to why?
    I'd guess the incentive structure is the opposite of what you're implying; the same reason why every cordless drill manufacturer has their own battery connector: vendor lock in fuels private planes and shareholder reports, versus being compatible means no one is forced to buy your batteries and thus profits are `$total - $forced_purchases`
    This situation gets even worse in the case of a standard without any objective way of knowing one is in compliance. Having a badge on the mysql.com website saying "now featuring SQL:2023 compliance!11" sells how many more support contracts exactly?
    
    bafe 3 years ago
    
    That's a good point. Additionally, it seems the standard isn't freely available and I doubt most of the developers of existing SQL DBs partecipate in drafting new standards. It seems it is doomed to diverge even further, which begs the question whether is it relevant anymore to have the SQL standard at all

qalmakka 3 years ago

As much as I try understanding it, I don't see the point of this standard honestly. It sounds like some weird fanfiction made by delusional people who think SQL is actually a single language and not a hodgepodge of incompatible dialects.

It makes no sense to have a standard SQL when nonsensical implementations like MSSQL or MySQL exist.

sdflhasjd 3 years ago

Ah, a sequel for SQL

Alifatisk 3 years ago

Good one

Settings

SQL:2023 has been released

Keyboard Shortcuts