It's not you, it's SQL

stack.convex.dev

61 points by tristanz 3 years ago · 83 comments

Reader

The article presents the move from the old, doddering SQL to the newer, better NoSQL - but the hierarchical database model actually _predate_ SQL. The first databases like IBM's IMS were hierarchical a lot like Mongo is: Codd was actually trying to address the problems with that model when he created the relational model.

jamwt 3 years ago

Article author, here. Good point -- but the article actually proposes document relational, not hierarchical. Relational is definitely good!
- dragonwriter 3 years ago
  
  It vaguely references it, doesn’t explain what the data model it is proposing actually is in concrete terms. It gives some API examples, but nothing that tells me what is different between it and the data model of any of the major (Object) Relational DBMS’s with an appropriate host-language client adaptor except that the client library exists only for TS and the type system is designed to be 1:1 with TypeScript. And the data model arguments (except for the client-language type equivalency for TypeScript) are all exactly the classic arguments for ORDBMS’s over classic RDBMS’s, which I guess “won” to the extent that many major DBMS are ORDBMS’s, but kind of lost in that almost everyone, even using them from an OO host language with the kind of deeply OO code that was in fashion a decade or two ago but is falling out of fashion these days, uses mostly just the RDBMS features, and it works pretty well because of the universality of the relational model.
  Really, the big thing that I see here is the transaction retry logic, which (if the writes in your transaction depend only on the reads on not side effects, which you’d need to declare) would be a nice feature for any db engine, and (if the side effects are packaged together in a safe way with the relevant reads and writes, which can be more general) would also be a nice feature for a high-level db client library, independent of backend. But that’s a pretty thin reed to hang a dependency on a proprietary SaaS solution over, say, Postgres on.
  - jamwt 3 years ago
    
    > doesn’t explain what the data model it is proposing actually is in concrete terms.
    Fair enough. The goal of the post wasn't necessarily to comprehensively describe the product–so the specifics of what we provide are found elsewhere in our docs, etc.
    > Really, the big thing that I see here is the transaction retry logic,
    These transactions run as embedded functions inside Convex in a runtime where it is impossible to have side effects.
    
    dragonwriter 3 years ago
    
    > These transactions run as embedded functions inside Convex in a runtime where it is impossible to have side effects.
    The reason I expressed that the side effect free version would be useful in any engine, and the side effect encompassing one would be useful in any client library, is transaction patterns where external interaction is needed within the transaction are not uncommon. A tradeoff of automatic transaction retries for a constraint that you can never have client-side side effects conducted between operations of a database transactions means you are likely to have to write more higher level business transaction compensation logic around lower-level database transactions. This may, in one respect, have greater conceptual purity, but in practical terms its a trade off, not a pure gain.
jd_mongodb 3 years ago

Comparing any modern database (i.e. developed after 1980) with IMS is like comparing a flint knife with a Barrett .50 calibre.
- commandlinefan 3 years ago
  
  Which one is the flint knife and which one is the Barrett .50 calibre?

Justsignedup 3 years ago

My biggest issue with sql has and always will be the lack of definitions.

There's no way to express data structure knowledge in sql, only relations and keys. Discoverability can be quite lacking. And every time I want to join table A to B I have to re-define everything, because SQL doesn't store that.

ORMs help. They help because they encode relationships in meaningful ways. A has many Bs, so A.B works, and I don't need to repeat this join logic every damn time. But ORMs have down sides too. Sometimes they generate queries that are really sub-optimal.

I think my favorite ORM usage was with Hibernate. I KNOW!!! THAT DEVIL! But honestly we wrote SQL in hibernate, and then invoked it to populate our data. Yes it was a bit more work than RoR's "order.items.where("price > 40")" but when complex things happened, it was always easier.

I've always looked at nosql as "absolutely, sounds great, how do you represent relationships?"

cfeduke 3 years ago

You can sort of fake the joins between two tables and avoid the ceremony with `natural join` assuming you've named the columns the same. (This doesn't work if your team has followed a normal pattern of "table1.id" with a FK in "table2.table1_id".) It's more like a hack and it falls apart as soon as someone starts mucking with column names [what evil person would ever do a thing like that...] or happens to share unrelated columns with the same name across two tables. So yeah, probably don't use it, except maybe to save yourself some time writing OLAP queries but increase your time spent debugging those same queries. :)
dragonwriter 3 years ago

> There’s no way to express data structure knowledge in sql, only relations and keys.
Relations, keys, and (though you forgot them) constraints express…quite a lot of data structure knowledge.
> And every time I want to join table A to B I have to re-define everything, because SQL doesn’t store that.
SQL stores that if you tell it to; the usual way being view CREATE VIEW.
richiebful1 3 years ago
A database system could implement a feature that automatically uses the foreign key to join to a table. Maybe some RDBMS out there does this.
For example, you have a many to one relationship between posts and users. Instead of this:
```
  select *
  from user as u
  join post as p
  on p.user_id = u.user_id;
```
You could do:
```
  alter table post
  add constraint fk_user_id foreign key (user_id) 
  references user.user_id;

  select *
  from user as u
  left referent join post as p;
```
Any good sql auto-completer will also look up the foreign key information and auto-generate the on clause for you as well. Redgate SQL Prompt (mssql only) is one of the best tools out there for this reason
- sgk284 3 years ago
  
  SQL already covers this use case with the `using` keyword. But you need to specify the shared column name. If you didn't need to specify the column then adding a new foreign key between the tables would make existing queries ambiguous and break backwards compatibility. See:
  https://www.postgresql.org/docs/current/queries-table-expres....
  - dragonwriter 3 years ago
    
    > SQL already covers this use case with the `using` keyword.
    No, “using” doesn’t automatically join by foreign key, it joins by explicitly-provided column name which must be identical between the two tables.
    > If you didn’t need to specify the column then adding a new foreign key between the tables would make existing queries ambiguous and break backwards compatibility.
    Not if you specified with constraint name rather than the column name (which also works for multi-column foreign keys without having to reiterate all the columns, being much more DRY than current SQL USING.)
    Better, with that approach you could also allow fully implicit joins, using the foreign key constraint name as if it were an row-valued field in the referencing table.
    E.g.:
    CREATE TABLE employees ( id BIGSERIAL PRIMARY KEY,’ name VARCHAR, manager_id BIGINT CONSTRAINT manager REFERENCES employees );
    would let you do:
    SELECT name as employee_name, manager.name as manager_name FROM employees
    as syntax sugar for:
    SELECT ee.name as employee_name, mgr.name as manager_name FROM employees ee LEFT JOIN employees mgr ON ( employees.manager_id = mgr.id )
    
    Justsignedup 3 years ago
    
    wow i absolutely did not know this. amazing.
  - marcosdumay 3 years ago
    
    Using is a bad standard. It doesn't do the obvious thing (that is following foreign keys), it requires a heavy amount of bad conventions, it fails for the not too rare case where you need more than one FK between the same two tables, and if you go strictly with the standard (good for Postgres that it doesn't), it breaks the table-aware semantics of column names.
    Meanwhile, the obvious natural way to make joins isn't available.
civilized 3 years ago

Hi, just curious - could you or someone else be more specific about the way in which an ORM encodes relationships that SQL doesn't?
- jrumbut 3 years ago
  
  An ORM allows you to encode a relationship once then use it for many different queries, I think that's the idea.
  - civilized 3 years ago
    
    But a view also does that? Like, if you want to assemble information about a user from several different tables, you can have a view that does the join for you.
    
    jrumbut 3 years ago
    
    Sure, but people never use those. Also it gets back to the discoverability issue. The ORM documents relationships with the rest of the model code, if you have a poorly named view in a large, complex schema it may be hard to find. You could reinvent the wheel easily.
    Like everything with SQL, you can solve the problem but sometimes the solution isn't elegant. People want elegance.
    
    Justsignedup 3 years ago
    
    This is an accurate description of how I think of this problem. And it would be tough to describe all this:
    - table a joins b yay a view
    - table b joins c yay a view
    - table c join d yay a view
    - table a joins to d. well technically it can, but are you really gonna write all the permutations of views for every possible join?
    ORMs encode that nicely so I can easily walk the relationships and get to the query I need.
    
    dragonwriter 3 years ago
    
    Essentially all of the actual information is encoded in the foreign key constraints (in combination with uniqueness constraints.) What ORMs provide that SQL doesn’t isn’t much encoding of information (they usually have facilities to distinguish data tables from pure join tables, and to distinguish 1:1 and 1:M, (M>0) relations from 1:(0-1) and 1:M, (M>=0) ones, so they do encode some additional information), but provide convenient syntactic shorthands for the client to use the encoded relationships.
    This would be easy to add to SQL, as syntax sugar, https://news.ycombinator.com/item?id=34587412
    
    Justsignedup 3 years ago
    
    To be completely honest, I forgot who was it, but about a year or 2 ago someone posted about a sql alternative language. The goal was to be adaptable to be used in all sql application, but a better designed language so that things like a calculation in a select can be reused in another select column, and better structure to make things easily reusable or composable, etc.
    Just a new language to replace sql, that is still sql but just better designed with the benefit of 30+ years of language design improvements.
    
    zX41ZdbW 3 years ago
    
    There were countless attempts to extend or replace SQL:
    OQL: https://en.wikipedia.org/wiki/Object_Query_Language UnQL: https://www.dataversity.net/unql-a-standardized-query-langua...
    More modern:
    PRQL: https://prql-lang.org/ Malloy: https://news.ycombinator.com/item?id=30053860 (it is so obscure that Google replaces it to "Malay language")
    Another example: ClickHouse supports standard SQL with many features such as window functions, SQL/JSON, and extends it to make it convenient for data analysts by adding: - higher-order functions; nested data structures, arrays, tuples, and maps; aggregate function states as first-class citizens, unrestricted usage of aliases in any place of expression, etc.
    https://github.com/ClickHouse/ClickHouse/
    I'm an everyday user of ClickHouse, and I'm finding its SQL implementation the most pleasant to use! Although it's unsurprising, because I'm also one of its authors... I'm also welcoming innovation and improvement of SQL without the introduction of a completely different language.
    
    jrumbut 3 years ago
    
    ClickHouse is one of those technologies I've had half an eye on for a while.
    In your completely unbiased opinion (I kid), do you think it's a good choice for the following problem?
    I have multiple sensors that read different types of data about the same subject at (annoyingly) slightly different intervals, usually a few dozen times a second. This needs to be combined with other event data that happens on the order of a few times per day.
    Currently I analyze this data in Python, R, and sometimes SAS (a weird proprietary language). Some coworkers use Matlab.
    Is that a ClickHouse problem? If I tried it out would the ClickHouse community be interested in hearing how it goes?
    
    zX41ZdbW 3 years ago
    
    Looks like a good scenario for ClickHouse.
    One option is to just record all the measurements with the corresponding time. Something like a table with:
    sensor_id, time, value
    To align and correlate the measurements, simply round down the time to some bucket. Do something like
    SELECT toStartOfMinute(time) AS t, anyIf(value, sensor_id = 'X'), anyIf(value, sensor_id = 'Y') FROM measurements WHERE sensor_id IN ('X', 'Y') GROUP BY t ORDER BY t
    Yes, I'm interested in, how it will go! milovidov@clickhouse.com
    
    zX41ZdbW 3 years ago
    
    Another interesting option for correlation of measurements at uneven intervals is - using ASOF JOIN.
    
    Justsignedup 3 years ago
    
    PRQL was what I was thinking of. Problem is integrating PRQL into say rails.
    
    ntonozzi 3 years ago
    
    Was it PRQL? That’s what all of these comments remind me of.
    https://prql-lang.org/
    
    bambax 3 years ago
    
    > Sure, but people never use those
    Why not? I use views all the time, and they seem pretty elegant and simple to me.
MaxBarraclough 3 years ago

> Discoverability can be quite lacking.
A richer type system would help there, but to my knowledge this isn't offered by any major relational DBMS.
- isoprophlex 3 years ago
  
  https://www.postgresql.org/docs/current/sql-createtype.html
  Postgres allows you to define custom types at least; making some enums or composites might give you some measure of sanity..?
  - MaxBarraclough 3 years ago
    
    Thanks.
    The Ada programming language is known for encouraging use of specialised numerical types, where the range can be expressed and automatically checked at runtime, and the compiler can help protect against nonsensical comparisons or arithmetic. At a glance I think Postgres range types offer something similar. Presumably strong typing could also be achieved with Postgres composite types (of just one member).
    It would be useful to have such checks so that a nonsensical comparison like WHERE person1.height = person2.net_worth would result in an error. Presumably it could also be helpful for autocompletion.

kneebonian 3 years ago

Holy crap, now I know I've been in this field to long we are right back to the NoSQL vs SQL debate, for the checks notes 3rd time.

Wow.

rjbwork 3 years ago

Not just that, but presenting hierarchical as if it isn't a pre-SQL thing that Codd's model was attempting to fix and displace (and succeeded beyond his wildest dreams at doing so, I'd guess).

schmichael 3 years ago

Wow, this is written by an early mentor of mine who introduced me to CouchDB and MongoDB circa 2008! I went on to take MongoDB to Urban Airship (now Airship), make a complete mess out of things (lots of evidence of that on HN even), and eventually port it all to a Postgres cluster. Meanwhile jamwt went to Dropbox and ended up managing thousands of SQL instances of his own!

And now both of us are back on non-SQL datastores: Convex for him, and Nomad's combination of Raft+MemDB for me. While Convex sounds influenced by transactional memory (define your critical section and let the platform keep trying to apply it until it succeeds), Nomad opted for Raft's serialization of mutations. On the read side we both opted for explicit index use, so it seems fair to assume both of us feel like SQL's decoupling of indexes from queries does more harm than good. Query planners are great until they're not.

I would love a SQL (well SELECT) interface to Nomad's datastore (MemDB). SELECT is like the C ABI for data instead of code: imperfect, but a lowest common denominator that if implemented unlocks a ton of workflows.

I wonder if jamwt feels the same and both projects will eventually grow a SQL adapter (or at least a SQL-like DSL... Nomad is close with bexpr filtering on a number of APIs, but JOINs would be really useful).

jamwt 3 years ago

Hi there Michael! We continue to love nomad.
SQL is the C ABI of querying for sure. BI tools will never adapt to use Convex directly, and nor should they.
So... yes, Convex actually had a prototype SQL adapter for the read side of things back in the early few months when we were figuring things out. Convex's read semantics are very compatible with SQL.
We've kept this adapter on ice in part because of point #3 in the article -- we don't want to overpromise things which are a bad idea.
Meaning, if we exposed SQL on the thing as-is, this would presumably be for more analytical type queries involving patterns normal Convex queries can't express. Right now that would be a Bad Idea because your website would slow down just like every other database system allows you to.
So the current recommended practice is use our Airbyte Egress connector (https://airbyte.com/connectors/convex) and get yourself into an offline Clickhouse/MySQL/Snowflake whatever and jam SQL over there to your heart's content. That's basically what we do.
We may one day abstract this away by embedding some sort of mirrored column store sql thing (maybe DuckDB based? who knows) so you can do your analytical work without impact on your OLTP or integrating 3 more systems. But the team hasn't invested in that yet and probably won't for some time.
- schmichael 3 years ago
  
  Hello!
  > Convex actually had a prototype SQL adapter
  Ha, this sounds like the heaps of scripts we Nomad developers have lying around for converting ad hoc bits of Nomad state to sqlite for debugging particularly stateful problems.
  > So the current recommended practice is use our Airbyte Egress connector
  Ah this is where we would like to end up with Nomad's event stream API: https://developer.hashicorp.com/nomad/api-docs/events#event-...
  Sadly we've been unable to identify that First Sink to implement and open source. It's a lot of work to build and maintain production quality connectors, so it would be nice to do just one like your Airbyte Egress. This is an area where Nomad's onprem nature makes it hard to pick one solution that will work for a wide number of users. Someday!
jamwt 3 years ago

Also, yes, great pickup re: transactional memory. We talk about this internally all the time, this is the inspiration.
Haskell: IO -> STM -> (pure)
Convex: (The browser / Convex Actions / other effectful environments) -> Mutations -> queries
All the same benefits re: retry and memoization.
Steal steal from Haskell, so many great ideas there.

jrochkind1 3 years ago

> In Convex, the application and database types are automatically equivalent because the entire data pipeline from frontend to database uses the same exact types and definitions... There is no adapter code between languages because everything we write is just TypeScript.

It may actually be fine to just require everything to be typescript, but the idea that you'd require your application be written in the same language as your data store, and thus implement a different data store for each language you might want to write an app in (and not share between languages)... would formerly be thought of as pretty ridiculous?

But maybe it's not anymore?

But that seems to be the requirement for "the application and database types are automatically equivalent because the entire data pipeline from frontend to database uses the same exact types and definitions," yes?

In general, most of the purported benefits of convex seem to effectively require the db in the same language as the app, to make it all so seamless.

I guess we'll see if Convex can get popular enough to spawn lots of "what I wish I knew about convex a year ago" and "Goodbye Convex, hello Postgresql" posts...

jamwt 3 years ago

Article author here.
> It may actually be fine to just require everything to be typescript, but the idea that you'd require your application be written in the same language as your data store, and thus implement a different data store for each language you might want to write an app in (and not share between languages)... would formerly be thought of as pretty ridiculous?
The vision is definitely aspirational, and is reflecting on the fact that JS/TS is becoming overwhelmingly the most popular language for projects. With things like React Native, for many teams even mobile apps are taken care of all on one language.
There will obviously always be apps written in other languages, and in those cases, a bit more type mapping will be necessary, because (for now) Convex functions are always side-effect free, automatically cached JS/TS running in a very particular runtime. But we'll work on making that translation as smooth as possible in our client libraries.
We have a Python client library out now ( https://pypi.org/project/convex/ ) and a Rust one coming soon, just to keep us honest in this respect.
> I guess we'll see if Convex can get popular enough to spawn lots of "what I wish I knew about convex a year ago" and "Goodbye Convex, hello Postgresql" posts...
Well, that's the dream, isn't it?
- jrochkind1 3 years ago
  
  I see. So, when you say "the application and database types are automatically equivalent because the entire data pipeline from frontend to database uses the same exact types and definitions", can that apply somehow to python and rust client libraries too?
  What about "all write operations on the database are done in embedded functions"? For the python and rust client libraries, do you write these embedded functions in python or rust, or typescript/JS? Do you effectively have to write a specialized update procedure in typescript for every update you want to do? This starts seeming harder to spin as an advantage if you aren't typescript in the first place. ("A custom stored procedure required for every update" is actually not something anyone's been looking for!)
  The OP arguments seem to me to be written assuming you are writing typescript in your app, it's unclear to me how the advantages you claim translate if you are not -- such that I figured the product was actually only for typescript apps! It would be interesting to see a rewrite of it that explicitly targets those writing in, say, python or rust.
  - jamwt 3 years ago
    
    Folks for whom TypeScript is a big part of their project are going to be the most natural fit for Convex.
    But no, not necessarily exclusively TypeScript. For example, the existence of the Python client library is due to developer demand. Some users had ML jobs that are triggered by Convex applications or reported outcomes into Convex.
    The embedded functions are always TypeScript. The experience is pretty smooth in other languages as well-but the degree of this is largely dependent on how easily the type systems map from e.g. Python to TypeScript and how our client library can infer things in a way that feels native or requires little of your involvement.
- nativecoinc 3 years ago
  
  Article ostensibly about doing better than SQL: Ah, interesting
  Underlying vision or assumption that we’ll all be using JavaScript/TypeScript in any case: Freaking really??

DerArzt 3 years ago

Man what a well written ad.

treis 3 years ago

My reaction exactly. Specifically when they said what developers want is document DBs. That's not true. We (or a lot of us) want the equivalent of Typescript for SQL. Which people have tried to build but their efforts have ended up like CoffeeScript. Better but ultimately not worth using because they don't have the staying power.
- cfeduke 3 years ago
  
  > equivalent of Typescript for SQL
  My limited experience with LINQ to SQL was very positive in these regards - but one is not writing a type safe SQL at all, instead a language that is type safe and is ultimately interpreted as SQL.
  Same experience with some libraries in Scala - where it's possible to get compile-time type safety for SQL - but required onerous setup or synchronization/code generation tools to achieve. (Sort of comes with the territory.)

eatonphil 3 years ago

On the other side of things, it's interesting to see SQL engines innovating. Though the only one I can name for sure is DuckDB: https://duckdb.org/2022/05/04/friendlier-sql.html.

Have other implementations done anything for making queries easier (that don't involve requiring an IDE or anything, so just query language innovations)?

Edit: Not query language innovations but ClickHouse and Snowflake's "standard library" of builtin functions just keep getting better and better. Once I saw `parseDateTimeBestEffort` in ClickHouse I wanted that everywhere...

jamwt 3 years ago

Article author here -- DuckDB is indeed cool.
Another interesting bit of work in this space is LINQ on the microsoft side of things. And even list comprehensions end up having a scanning/summing language kind of feel to them that would be interesting to see translated into a new database query language.
znaimon 3 years ago

+1 to ClickHouse builtin functions—they vastly simplify a lot of data analytics workloads

xpil 3 years ago

>> Unlike SELECT, these operations don't feature JOINs or subqueries or any other magic that brings together tables.

This is a false statement. Both INSERT and UPDATE support JOINs and subqueries / CTEs. At least according to the standard - not every engine implementing them is another story.

wvenable 3 years ago

I'm forgiving on the authors point here. If you have JOINs and subqueries, you're just doing a SELECT to get data that can only be UPDATEd/INSERTed on a single table. You can't do an INSERT across 5 tables in one statement.
- mkleczek 3 years ago
  
  You can in PostgreSQL and it is very useful: https://www.postgresql.org/docs/current/queries-with.html#QU...
- jamwt 3 years ago
  
  Article author here, yep, that was the intended point. Agree the wording was unclear, and there's a footnote now clarifying that. Thanks!
  - wvenable 3 years ago
    
    One could argue that one statement vs. many statements isn't particularly important and as long as you have transactions.
eatonphil 3 years ago

> not every engine implementing them is another story
Which don't? I'd have assumed anything inside of `SELECT`'s `FROM` would be allowed inside of `INSERT` and `UPDATE`.
Or maybe you're not saying you know there are implementations that have these restrictions just that any random implementation might not be there (yet).
- xpil 3 years ago
  
  > Which don't?
  Redshift (a PostgreSQL derivative, more or less) can run an INSERT with a join, but cannot do an UPDATE with a join.

cube2222 3 years ago

If I understand correctly this is basically trying to solve a very similar set of issues as something like entgo[0] but in a very TypeScript-native way?

Also, regarding the transaction functions, are those run locally, or are they serialized and run remotely on the database server? Both have their caveats.

Number 1 (and that approach to retries) already works just fine with Postgres, you just need a proper library (which wraps your function and retries on transaction isolation errors). But you also have to keep in mind that if you interface with 3rd party services during your transactions, those calls will need to be idempotent as well. This is actually the proper way to do transactions in application code with a SQL database, esp. if you're running with serializable transaction isolation.

Number 2 is very limiting, as you can't have all those third party services and libraries used between different operations in a transaction, which is often (I'd even argue - usually) very useful.

Since you're citing stored procedures as a viable alternative to convex's take, while not listing the above number 1, it sounds like you're doing 2.

Anyhow, good luck!

[0]: https://entgo.io

JohnDeHope 3 years ago

"It can be done. But can doesn't mean should." I'm getting worn out on this turn of phrase. Yes, can doesn't mean should. But also, couldn't doesn't mean shouldn't. You have to justify "shouldn't". TFA does go on to justify it, which is appreciated.

richbell 3 years ago

> Tab! Tab! Tab! PostgreSQL demonstrates its field autocomplete feature.

SELECT preceeding FROM is such a thorn in the side. :(

paulfitz 3 years ago

Agree, I started paying attention to duckdb when it started allowing FROM before SELECT https://github.com/duckdb/duckdb/pull/5076
- mjevans 3 years ago
  
  I appreciate the command first syntax, but maybe something like
  ; SELECT FROM sources ... WHERE / ORDER / ETC ... DATA fields ...
  What if DATA was a reserved keyword for column identifiers and other data fields that would normally be immediately after SELECT so they could appear anywhere in the query syntax?
BitwiseFool 3 years ago

I wish SQL did not require a comma between items after the SELECT and before the FROM. We don't need commas in between joins. I feel like someone could write a way to parse queries so that it isn't needed.
Can you imagine how much time and effort that would save people?
Edit: You do need commas in ORDER BYs, that slipped my mind when typing out this pet peeve of mine.
- jemmyw 3 years ago
  
  I'd be happy if it just accepted a dangling comma so you can swap your columns around freely without causing a syntax error
  - cfeduke 3 years ago
    
    When you see queries written like:
    select 1 ,field1 ,field2 ,field3 from ...
    you've hit peak SQL engineer comma frustration workaround.
- irrational 3 years ago
  
  But, what would the token be? Where clauses are separated by AND and OR. Order by are separated by commas. Joins are separated by JOIN, INNER JOIN, AND, etc. The parser knows where the WHERE clauses are because they see the WHERE token.
- dragonwriter 3 years ago
  
  > I wish SQL did not require a comma between items after the SELECT and before the FROM.
  > Can you imagine how much time and effort that would save people?
  I can imagine how much extra time I’d spend typing “AS” in queries, which would ne necessary for column aliases to be distinct from new select items. And how much more time I’d spend reading unreadable queries in contexts like logs where they aren’t pretty-printed. Net, it seems to be a big loss.
  > We don’t need commas in between joins.
  That’s because each join is introduced by a string of one or more reserved words (including JOIN).
  If each select field was prefaces with FIELD you wouldn’t need commas, but...
- evanelias 3 years ago
  
  That wouldn't be possible to parse, because of how column aliasing works: the "AS" keyword is optional. In other words, "SELECT foo something_else" is equivalent to "SELECT foo AS something_else".
  Also, I seem to recall that very old versions of Oracle did require commas between joins. (The join conditions had to be stated in the WHERE clause back then, instead of supporting the ON clause, iirc).
- marcosdumay 3 years ago
  
  You can use commas between tables perfectly well. The `join` token just has the same role as the comma, making it redundant.

WkndTriathlete 3 years ago

Tell us you don't understand the theory or use cases for SQL databases without telling us you don't understand the theory or use cases for SQL databases.

There are indeed times when a simple key-value store is appropriate, but the encoding of hierarchy and the relational calculus are useful for so many more use cases in business applications. SQL as a language is certainly flawed - it would be nice to be able to algebraically/programmatically join select clauses and conditions, for instance, or to perform multi-inserts - but the semantics of the storage and engine are based on some pretty mathematically sound foundations.

You may find the seminal paper by Codd as illuminating as I did if you read it.

vlunkr 3 years ago

I agree with lots of the points here. However one of the great strengths of the SQL family is the FOSS history. I doubt something proprietary is ever going win everyone over. At least I hope not.

schmichael 3 years ago

SQL took a long winding road to reach the FOSS friendliness it has today. Its history is littered with patent, copyright, and trademark disputes. Not to mention widely varying implementations and the continuing lack of a crossplatform wire protocol or standard interface. Proprietary incumbents still dominate huge sectors of the SQL landscape and pose compatibility and intellectual property issues for FOSS implementations.
I think any SaaS APIs that reach any degree of widespread adoption eventually get OSS implementations (eg S3's API is widely supported by FOSS and proprietary implementations). This evolutionary path is awfully similar to SQL's for better and/or worse.
jamwt 3 years ago

Article author here.
Agree this is a necessary part of the change. Convex is working on our OSS strategy this year. Thanks for the feedback!
- vlunkr 3 years ago
  
  Awesome! This is a great article. I've been thinking lately about how ORMs are basically a band-aid meant to fix the weaknesses of SQL, so it's great to see people trying to solve the problem from the bottom up.

cgh 3 years ago

Optimistic concurrency control is mentioned as a feature of Convex, the advertised product. But OCC is a common concern and has been for ages. In Java, the JPA standard specifies the @Version annotation which enables OCC on entity objects and it's implemented by the usual suspects like Hibernate. It's been around for at least a decade so I'm struggling to understand why this is being presented as some big innovative win, unless I'm completely missing something here.

jamwt 3 years ago

Article author here.
It's not new "in the world", but often new to our users, who understand it less intuitively than pessimistic locking.
Unless they're using Haskell or probably other very exotic environments with strict understanding of mutability and so on, they know Mutexes but not STM or other OCC patterns. So we lean heavily on explaining it. Not because it is original, but because it is unfamiliar.

015a 3 years ago

I mean... yeah MongoDB got a lot of hate, but I think the broader point is that it was one of the first technologies to popularize the domain of NoSQL. No one knew how to use it properly; so we adapted SQL-like schema design, and when it became obvious that didn't work well the hate started spilling over to the first technology to arrive at the party.

The elephant in the room is, I suppose, that the modern internet literally would not be possible without NoSQL. It may be possible without SQL; that seems likely to me. Part of that is because NoSQL is a big umbrella, and covers extremely critical databases like Redis or even databases like Cloudflare's proprietary edge cache. But, even document stores are extremely critical to enterprise scaling; during Prime Day 2022, DynamoDB peaked at 150M rps. There's no SQL setup on the planet that could handle volume like that while still maintaining all the things that Make It SQL; you could start throwing read replicas and removing joins and cross-table lookups and sharding data and ope, you just re-invented a document store.

Here's the couple conclusions I have started operating by:

1. Document stores are, today, a strong choice at both low and high scales on the spectrum of systems scaling. Its great at low scales because you can avoid thinking about it too much. Its great at high scales because once you have the space to think about it you can attain substantially higher efficiency (performance+cost).

2. Making a document store operate more like SQL, for the things SQL is good at (joins, validation, etc) is a lot easier than making a SQL database operate like a document store for the things document stores are good at (Planetscale? there's a couple players in this game).

3. SQL-the-language sucks. There I said it; I'll die on that hill. The language was invented at a time +/- 2 years of Scheme, ML, Prolog, and Smalltalk. Our industry has rejected all of those. We haven't rejected SQL (yet). SQL is demonstrably, as a syntax, just as bad as those languages, evidenced by all the ORMs, injection attacks, etc. Databases tend to have a longer shelf life than programming languages, but SQL-the-language will die.

4. But, duh, SQL is fine. Go ahead and use it. In some situations it makes sense. In others it doesn't. Its a tool; one that has had 60 years to form around solving as many problems as possible.

[1] https://aws.amazon.com/blogs/aws/amazon-prime-day-2022-aws-f...

brightball 3 years ago

> 2. Making a document store operate more like SQL, for the things SQL is good at (joins, validation, etc) is a lot easier than making a SQL database operate like a document store for the things document stores are good at (Planetscale? there's a couple players in this game).
For a lot of NoSQL document stores, the simple acting of having multiple indexes was a bridge too far. Couchbase comes to mind here.
I can't think of many use cases for a document store over PostgreSQL. Maybe as a cache layer...but I'd just use a materialized view. Where some of the data structure was going to be unknown or user defined, such as a system monitoring tool? Server Density was one of the first big MongoDB adopters for this reason. Now we can store than in a JSONB column though.
> 3. SQL-the-language sucks.
You're entitled to your opinion. I've been doing this 20 years and basic SQL knowledge has been by far the most valuable and portable skill of my career. I'm by no means an expert, but you asked developers who have been in the field for 5+ years what a HAVING statement does and they have no idea. There's a huge gap in basic SQL knowledge, stuff that can be learned in < 1 week. I taught somebody who came out of a code boot camp basic SQL for a couple of weeks during an internship. Her first job made her acting DBA because she was the only person who knew SQL.
The hill I'll die on, is that ORM's and frameworks are creating an astounding lack of basic SQL knowledge in today's developers which leads to a lot of completely unwarranted griping about SQL. (not directed at you, just in general on this topic)
- 015a 3 years ago
  
  But you just asserted the same point I was: No one knows it anymore! That makes the skill valuable on the short-term, but on the long-term you've got the next generation of programmers entering jobs, being given the opportunity to build and change systems, and what are they going to pick? Maybe not SQL.
  Its the same argument, just at a grander scale, for the generation before us and the COBOL mainframe programmers of old; database technologies just tend to move slower because data is very sticky.
  You might not see the use-cases; but Apple does. Their usage of Cassandra sustains in the millions of QPS across over 300,000 instances. That's a scale SQL can't even begin to dream of. Discord does; they moved to Scylla [2]. Amazon does [3]. Blackrock Aladdin depends on Cassandra, and manages over $21T in assets [4].
  Which kind of circles around to the fact that, while you say "a lack of SQL knowledge is what causes unwarranted griping about SQL", sure I can accept that, but I'd conversely argue that a lack of Solving Real Problems is what causes unwarranted griping about NoSQL. Schematization isn't a real problem; we have a dozen ways of solving that at different layers. Imperfect ACID compliance isn't a real problem; its a pseudo-academic expression of the guarantees that SQL offers, which then gets unfairly transplanted onto NoSQL databases as if those issues can't or wouldn't be addressed elsewhere. Even original article statements like data being inherently relational; data isn't anything, its just data, it just appears relational because its how we've been trained to think about data after 60 years of SQL-monoculture. If you start thinking about data end-to-end, how its displayed to users, on pages, components, screens, feeds, it really doesn't look relational at all; life isn't a graph, the graph is an expression of human pattern-matching on things that could reasonably be related, but in reality aren't; life is a series of snapshots. This moment, this page, this feed, this component, in this moment, for this request, for this session; that is the end-to-end view of data; documents, not a web.
  But SQL is fine at that! In fact, its a best option at medium scales; the middle of the bell curve where 80% of people live. It may not always be, but its there today. Its that bottom 10% and top 10% where the solutions that work for most don't work as well; and I think NoSQL gets a lot of Majority Hate despite being an extremely strong solution at those scales. Know your tools, and be willing to replace them when the circumstances change.
  [1] https://twitter.com/erickramirezau/status/157806381149547724...
  [2] https://www.scylladb.com/press-release/discord-chooses-scyll...
  [3] https://aws.amazon.com/blogs/aws/amazon-prime-day-2022-aws-f...
  [4] https://www.youtube.com/watch?v=322GytEo_fE
  - brightball 3 years ago
    
    Developers not knowing something they need to know isn't a case for getting rid of the thing, it's a case for insisting they learn it. That it's been abstracted away by ORMs is a testament to those ORM layers and the consistency of the standard...not an indictment of SQL itself.
    Let's talk real use cases for a second.
    All of the performance benefits of NoSQL come from key-value retrieval and the ability to shard data due to built in lack of joins.
    There's nothing stopping anyone from doing that in a relational database. It's a common pattern. CitusDB even made a PostgreSQL extension that makes this model even easier to use with joins and all the other relational goodies. If you want to setup PostgreSQL with an id field and a JSONB field, you've got the entire NoSQL scaling use case covered without all of the limitations of using a NoSQL only database.
    The schema at different layers thing doesn't _really_ work either. If you put the schema in your application code, then you mandate an API layer in addition to your database layer. With schema enforced in the database, multiple code bases can connect to the same database. With the schema in the application layer, now everything has to connect to that application. So not only are you working with the NoSQL database AND defining the schema in the application but now you also have to define an API if you have a reason to split out some portion of application logic to a more specialized languages.
    The layers of problems introduced because NoSQL is used on a project are excessive. There's no question that storing document data makes a lot of sense in some circumstances, but you're always better off just storing the portion you need in an appropriate column type (JSONB) rather than forcing your entire system into the issues that come with NoSQL only.
    Given the JSON capabilities of modern SQL databases there's almost no reason to start any project with a NoSQL only option.
    In the prime day article for example, they're singing the praises of their own SQL offerings too.
    > Amazon Aurora – On Prime Day, 5,326 database instances running the PostgreSQL-compatible and MySQL-compatible editions of Amazon Aurora processed 288 billion transactions, stored 1,849 terabytes of data, and transferred 749 terabytes of data.
    > Amazon DynamoDB – DynamoDB powers multiple high-traffic Amazon properties and systems including Alexa, the Amazon.com sites, and all Amazon fulfillment centers. Over the course of Prime Day, these sources made trillions of calls to the DynamoDB API. DynamoDB maintained high availability while delivering single-digit millisecond responses and peaking at 105.2 million requests per second.
    They don't given an apples to apples numbers comparison here. We don't have a cost comparison of infrastructure for DynamoDB vs Aurora. We don't get to see if "288 billion transactions" are database writes while we only see requests here. We don't get to see whether there have to be so many more requests to Dynamo because you can't get the data as easily in a single query. We don't get to see if it's being used as a cache layer or for session management, which are both great use cases for it.
    I mean, Instagram scaled just fine with only PostgreSQL. SQL scales just fine. Some of the bevy of options and tools that you get to work with the data don't scale as well as others, but that's not a case for getting rid of those tools...it's just a case for caching when needed.

xupybd 3 years ago

One of the best things I've seen that integrates SQL with your code is F#'s type providers. http://fsprojects.github.io/SQLProvider/

It's amazing how it all works. But only when it works. I've found it so flakey I don't use it for production but if the bugs were ironed out it would be amazing.

legerdemain 3 years ago

  > We paused just long enough to take a sip of our Spicy Maya Mocha from Coupa Cafe.

To be honest, I think almost all drinks at Coupa Cafe are pretty bad.

AnEro 3 years ago

Graph databases solve most of his issues, outside the complexity of queries.

If we are begging DB engineers for things can we get a graph layer for accessing SQL tables please?

bambax 3 years ago

This example is really curious:

    BEGIN;
    SELECT post_count, ... from users where ... FOR UPDATE;
    INSERT INTO posts VALUES(...);
    UPDATE users SET post_count = new_post_count WHERE ...;
    COMMIT;

For one, it's unlikely the number of posts per user is so important and so often requested that it needs to be cached in the database itself.

Secondly, why would that value need to be stored with each new insert? Simply insert posts as they come, and calculate metadata about posts at a later time as a batch process, or when some part of the application actually request them.

Settings

It's not you, it's SQL

Keyboard Shortcuts