Settings

Theme

Relational is more than SQL

fauna.com

136 points by tysont 2 years ago · 176 comments

Reader

herodoturtle 2 years ago

This is a very interesting way to promote a product, credit to the author (who is an industry veteran it seems).

I had no idea what Fauna was. I just clicked the link here because the title caught my eye (I work with databases quite a bit).

The opening paragraph immediately grabbed my attention - "My first deep dive into SQL was in 1987, just before I became the first technical person at Microsoft to work on SQL Server." - woah!

So I read this entire article, which is very well written and easy to read but mostly affirms what I already know.

And then I get to the final section where they promote Fauna - and so now I know about Fauna too.

Kudos to these folks, in my humble opinion, this is marketing done right.

  • probablypower 2 years ago

    This is interesting, because I have the exact opposite response to these sorts of articles.

    I think any bias or personal interest should be declared upfront in media (articles, videos, podcasts, ...) rather than appear as a 'common consumer' talking about a pain point in a relatable way. It really rubs me the wrong way when an article ends with a bait-and-switch, where you realise the entire article was manufactured to make you relate to their product's business case.

    Obviously this method must resonate with people, like yourself, otherwise it wouldn't become so common. I guess I'm just the 'B' in the A/B testing that results in this type of marketing.

    • xwowsersx 2 years ago

      I don't think there's any bait-n-switch unless you're unaware you're on fauna.com. Content marketing is well understood and at least in this case it isn't trash content someone spit out in 2 mins in order to lure you to their site. I don't see anything wrong with a company/product delivering value to you in the form of quality content and in exchange they are afforded an opportunity to run your eyes over their product offering.

    • naasking 2 years ago

      > It really rubs me the wrong way when an article ends with a bait-and-switch, where you realise the entire article was manufactured to make you relate to their product's business case.

      I'm not sure you have to take an adversarial interpretation of that tactic. If you don't find yourself agreeing with the setup, then you aren't a prospective customer, but if the article was informative to that point at least you now understand the existence of a domain that you're not aware of. Isn't that the point of reading technical articles? In the future, you might find yourself in that position after all.

    • zzzeek 2 years ago

      agree, I have to look at the domain name, the title / sidebar , etc. to see up front, "OK this is yet another 'we think we have a better SQL' startup", then I skip the whole thing.

      A site that's about "here's our product and why you might like it!" without getting into some "SQL, well you know, it has shortcomings" which is just unnecessary.

    • austhrow743 2 years ago

      Why would your original assumption not be that the Fauna website exists to promote Fauna?

snthpy 2 years ago

Disclaimer: I'm a core contributor to PRQL [1] and post about it a lot on HN. Apologies for jumping in on other people's threads, but for people interested in the headline, PRQL might be of interest.

At PRQL[1] we believe that SQL is a combination of two things:

1. Relational Algebra, which is eternal because it's just maths, and 2. A language designed in the 70s that looks like COBOL.

When people say that SQL will never die, they are usually thinking about Relational Algebra because SQL has been used interchangeably with that. With PRQL we agree that Relational Algebra is fundamental to thinking about data and we intend to keep that. However we've learned a lot about programming languages in the last 50 years and so PRQL is a revamp of SQL that brings the composability of functional languages and modern ergonomics to data transformations in order to improve the DX and UX of data scientists, data analysts and analytics engineers.

PRQL is simply a compiler that produces SQL so you can use it with whatever database you are currently using. It's completely open source with zero commercial associations and is deeply committed to staying that way forever.

1: https://prql-lang.org/

  • ttfkam 2 years ago

    The syntax comparison section will likely sour a lot of viewers who already know SQL. You try too hard to highlight how easy and terse PRQL is by putting comma-separated items on their own lines but in SQL you put each item on separate lines.

    It may be typical of many SQL users and formatters, but it leaves a poor taste in the mouth that you aren't interested in an actual comparison but in marketing.

    For those who already know SQL, the real question is: will it make my queries faster? Putting the FROM first isn't sufficiently compelling on its own. Having a processing pipeline, though marginally more elegant to look at, doesn't actually improve upon CTEs.

    When you say you can use it with any database, how do you handle functions, stored procedures, jsonpath, and the massive differences in functionality between Oracle, MS SQL Server, Postgres, DB2, MySQL, MariaDB, H2, SQLite, etc.? Lowest common denominator?

    After 49 years of SQL, more than syntax has to change; you need an engine that supports this natively and can actually improve planner behavior over existing engines.

    I will grant that if you are limiting your target audience to primarily analytics, it's probably sufficient. The marketing of PRQL doesn't always appear to do this however.

    • wackget 2 years ago

      Yeah the syntax comparison is deliberately misleading.

      They style it as "4 lines vs 10 lines!" when it's actually 4 lines vs 4 lines.

         # PRQL
         from employees
         select {id, first_name, age}
         sort age
         take 10
      
         # Misleading SQL
         SELECT
           id,
           first_name,
           age
         FROM
           employees
         ORDER BY
           age
         LIMIT
           10
      
         # Actual SQL
         SELECT id, first_name, age
         FROM employees
         ORDER BY age
         LIMIT 10
      
      The join example is similarly deceptive:

         # PRQL
         from employees
         join b=benefits (==employee_id)
         join side:left p=positions (p.id==employees.employee_id)
         select {employees.employee_id, p.role, b.vision_coverage}
      
         # Misleading SQL
         SELECT
           employees.employee_id,
           p.role,
           b.vision_coverage
         FROM
           employees
           JOIN benefits AS b ON employees.employee_id = b.employee_id
           LEFT JOIN positions AS p ON p.id = employees.employee_id
      
         # Actual SQL
         SELECT employees.employee_id, p.role, b.vision_coverage
         FROM employees
         JOIN benefits b USING employee_id
         LEFT JOIN positions p USING employee_id
      
      Nonsense.
      • sebastianz 2 years ago

        It is also (for me) quite a bit noisier and less readable than SQL. The join example has multiple different syntactic symbols, for no clear intuitive reason (= vs ==, {} vs (), side:left vs p=positions).

        • biglyburrito 2 years ago

          I actually like it; it makes sense at a glance, without having read a single line of the official documentation. But I think people should be free to use either syntax -- method or query -- as they see fit, without judgment of which one is better.

      • maximilianroos 2 years ago

        [PRQL dev here]

        I strongly think we should have the best examples of SQL to compare against. I've ironically made this complaint for other libraries, so I'm alarmed that folks think we might have done the same.

        We would take a PR for the first example if folks think that's better spacing. (I think the spacing is the only difference?)

        For the second — `USING` isn't fully equivalent to `ON`. There are discussions on GH (https://github.com/PRQL/prql/issues/723) as to which we should compile to. In the meantime, we'd take a PR on anything that is equivalent.

        • dragonwriter 2 years ago

          > For the second — `USING` isn’t fully equivalent to `ON`.

          In terms of portability because its not supported by, e.g., MSSQL, sure.

          In terms of its semantics, though:

            t1 [LEFT/RIGHT/INNER] JOIN t2 USING col
          
          Is fully equivalent to:

            t1 [LEFT/RIGHT/INNER] JOIN t2 ON (t1.col == t2.col)
          
          So for a comparison to SQL as used by most RDBMSs (rather than MSSQL specifically), rather than “what should PRQL compile to”, USING is quite appropriate.

          It may be that the intent of the homepage pairing is to highlight the actual compilation result and not provide a comparison to SQL-as-it-would-manually-be-written, but the presentation doesn’t make it clear that that’s the purpose.

          • maximilianroos 2 years ago

            I posted a link to the issue discussing this on the repo. That gives a good overview to those interested in this issue.

            To take one point from there:

            > Is fully equivalent to:

            They're not fully equivalent — `USING` combines the two columns into a single column, `ON` doesn't.

            • ttfkam 2 years ago

              USING just presumes the columns have the same name on both ends. There is no logical "combining", only a lexical one. It's why many DB admins/engineers use "foo_id" instead of "id" in their schemas as a rule. (Notably, also one of the reasons why many DB admins/engineers hate ORMs and other table generators that name id columns generically as "id".)

              The person you are responding to is likely in this cohort. You're correct that they aren't exactly equivalent, but that's an artifact of your table definition, not the language. It likely seemed perfectly reasonable to the commenter that this naming was a trivial detail in the scope of the greater conversation.

              • dllthomas 2 years ago

                    WITH t AS (SELECT 1 x)
                    SELECT x FROM t JOIN t u ...;
                
                That breaks if you fill in the ... with "ON t.x = u.x" because there are two columns called "x", but works with "USING (x)" because they get collapsed into a single column. I think it makes sense to say those aren't fully equivalent.
              • maximilianroos 2 years ago

                OK, so we agree they're not fully equivalent

        • lolinder 2 years ago

          > We would take a PR for the first example if folks think that's better spacing. ... we'd take a PR on anything that is equivalent.

          "We'd take a PR" is a great line for someone who's already sold on the concept, but it's not super helpful for flaws in your marketing. Reading OP, it doesn't sound like they trust you enough to find it to be worth their time fixing your examples for you.

          • maximilianroos 2 years ago

            Yes, this is fair.

            Though I'm not sure how to find consensus on what is the best representation of SQL without asking folks to make a reviewable suggestion.

            I had thought the examples are currently in their clearest state — I definitely don't think it's obvious that removing line-breaks makes it look clearer (which is the full extent of the feedback IIUC...)

      • OJFord 2 years ago

        Also there's no need to capitalise, so the 4 lines vs. 4 lines is actually just 'order by & limit' vs. 'sort & take'. NBD really, but it's probably nicer in more complex examples.

        But I wanted about this the last time PRQL was submitted in its own right - https://news.ycombinator.com/item?id=36869376 - so I'll leave it there!

      • Scarbutt 2 years ago

        Looks like they are showcasing the output of the compiler, not the hand written SQL.

      • j-a-a-p 2 years ago

        PRQL is a solution in search of a problem. Lacking such, the problem had to be fabricated.

        • naasking 2 years ago

          SQL has many problems that PRQL ostensibly addresses. Query reuse, composability, abstraction and more.

          • ako 2 years ago

            Query reuse, composability and abstraction are all sufficiently solved with views in my experience.

            • naasking 2 years ago

              Then your experience doesn't extend to dynamic queries composed of query fragments, which is almost universal to all e-commerce sites.

              Views are a degenerate, second-class form of query abstraction, as are common table expressions and the numerous other features added to SQL over the years to paper over the lack of proper first-class query abstractions.

              It's like saying a programming language doesn't need first-class functions because we have disjoint unions and we can perform defunctionalization by hand. Strictly true, but really missing the point.

              • ttfkam 2 years ago

                When viewed as a DSL for set theory, views, CTEs, set-returning functions, et al are indeed proper first-class query abstractions.

                When viewed through the lens of general purpose imperative or functional programming languages, it's easy to see how it can be seen as falling short.

                I'll admit much of the tooling and driver APIs leave a lot to be desired.

                Some tools do make good efforts though such as nested fragments in this driver.

                https://github.com/porsager/postgres#building-queries

                • naasking 2 years ago

                  > When viewed as a DSL for set theory, views, CTEs, set-returning functions, et al are indeed proper first-class query abstractions.

                  It's still lacking even then. You can't assign a set to variable for instance, and then use that variable in other definitions or queries.

                  • ttfkam 2 years ago

                    You can in Postgres. It's not often a good idea since it would constitute an optimization fence and potentially buffer large values in memory, but you can absolutely do this in a function variable.

                        record[]
                        some_table_name[]
                        jsonb containing an array of records
                    
                    And of course the most obvious shared set variable of all, supported by every popular RDBMS in existence: the temporary table. Something scoped to the current session, of essentially unlimited width and length, and accessible to other definitions and queries.

                    We get it. You want something closer to a general purpose programming language. But don't confuse your preferences with actual missing features.

                    • naasking 2 years ago

                      Temp tables are another bandaid. Like views, they are another second-class abstraction intended to address the lack of generality of relations. As you hint, adding second-class features is intended to handle pain points while keeping optimization simple, but it's the wrong way to do it IMO. No fixed number of second class features can make up for the lack of relations as first-class values, so this is not just a matter of preference.

                      I actually don't want a general purpose programming language as that's not suitable for data access, I just want a query language that doesn't impose artificial limitations on relations.

                      Edit: although I will admit that programming language theory is a hobby, so that's why I understand the source of the pain every time I have to use SQL. LINQ almost gets this right, but is still a huge improvement on SQL.

    • iamsam123 2 years ago

      Does it SELECT * by default if I never define a SELECT below my FROM? ... Continuing to encourage folks by allowing them to SELECT * easier is would not be fun for me... I could be wrong?

      Agreed, just parsing out the formatting so its "fewer lines" than traditional SQL soured me.

      The expressions example is ridiculous, in Redshift I can do this all day?? SELECT 1 + 2 AS num1 , num1 * 2 AS num2 -- Literally no difference

      Just learn SQL...

      • Zak 2 years ago

        > Just learn SQL...

        I know SQL, and I imagine the authors of PRQL know it better than I do.

        Doesn't it seem weird that dozens of application languages have become popular since the 1970s, but we're still using dialects of the same old database query language? If it had a really elegant syntax, perhaps it wouldn't, but SQL's syntax is anything but. Some of the semantics can be awkward as well.

        I, for one welcome attempts to move things forward (which is different from saying I'm going to run out and use PRQL in production tomorrow).

        • ttfkam 2 years ago

          > Doesn't it seem weird that dozens of application languages have become popular since the 1970s, but we're still using dialects of the same old database query language?

          Indeed. Do you honestly believe that a half-century of data storage professionals and vendors are blindly moving forward with a hobbled tool?

          Or maybe there are aspects of SQL as a set-oriented 4th generation programming language that aren't apparent to folks who are intimately tied to an imperative or functional programming paradigm as opposed to a declarative DSL for set theory.

          • Zak 2 years ago

            Within popular application languages, for any given paradigm, there are almost always several languages that aren't merely dialects of each other. C# isn't a dialect of Java. Ruby isn't a dialect of Python. Rust isn't a dialect of C++.

            PRQL demonstrates that a set-oriented declarative language need not be a dialect of SQL and isn't the first language to do so (QUEL appeared in the 1970s). It seems odd to me these alternatives haven't gained much popularity.

    • Izkata 2 years ago

      > Putting the FROM first isn't sufficiently compelling on its own.

      Personally I see that as not even neutral, it's a downside. Optimizing for autocomplete is an antipattern, code is read far more often that it's written and the SELECT clause is the interface to the following code. It should be easy to find when skimming, not buried in the query.

      The SELECT clause is also akin to an assignment and it's extremely rare I see anyone advocating flipping the order of those to match what they say they want in SQL.

      Edit: Since I'm sure someone is going to jump on it, yes, I'm conflicted about the WITH clause: It's extremely useful and I like what it does, so I do use it, but I don't like where it's positioned. I've been toying with indentation to work around it so SELECT is still just as visible as otherwise.

    • anon84873628 2 years ago

      You might find Malloy interesting as it makes a greater departure from SQL syntax. Queries are first class objects which can be nested within each other in order to do trellising. It still compiles to SQL because that is the only language accepted by DBMSs today; however it will automatically write symmetric aggregate calculations and do those nestings that are hard for a human to write.

      https://www.malloydata.dev/

      • ako 2 years ago

        How are views not first class query objects?

        • ttfkam 2 years ago

          Yup.

          "SQL doesn't have first-class query objects and isn't composable."

          Uh… views, set-returning functions, temporary tables, CTEs…

          "No, not like that! Something not declarative and set-oriented!"

          SQL is declarative and set-oriented. 3rd generation language models for data were tried and discarded for good reason decades ago.

          "But… but… I don't like set-oriented models! They make me feel dumb. I'm not dumb, so SQL must be dumb. Not meeeeeeeee!!!"

    • maximilianroos 2 years ago

      [PRQL dev here]

      I strongly think we should have the best examples of SQL to compare against. I've ironically made this complaint for other libraries, so I'm alarmed that folks think we might have done the same.

      We would take PRs for any improvements to the SQL that make it a better comparison.

      • chrisjc 2 years ago

        Not a criticism, more of an observation.

        The point of being able to write "FROM" before "SELECT" has become moot to me since many DBs support that SQL "convention" already.

        The newline/indentation is something I would do for readability in the PRQL too.

        At the end of the day, I'm becoming less concerned about these various query language syntaxes and more concerned about the logical and physical plans generated by the respective query engines.

        In my experience, what ends up most problematic is that each query-engine/optimizer sometimes/usually requires SQL tweaks/nuances to perform as efficiently as possible. That's where reading/writing/maintaining queries can get really confusing/difficult. An abstraction such as PRQL, dataframes, etc might add value here, but only if the abstraction (or runtime/evn built around it) can tune the output SQL to each query engine as it is needed.

        There are a lot of projects using frameworks/libraries such as Substrait and SQLGlot to accomplish this.

        https://substrait.io/

        https://github.com/tobymao/sqlglot

    • remram 2 years ago

      It's just syntax, it compiles to SQL and runs on today's DBMS. It has no difference in speed or functionality.

      • lolinder 2 years ago

        It does make a difference if the resulting SQL is unperformant.

        Someone who is good at SQL can look at a query and see where the query planner might go wrong, then make subtle tweaks to get better performance. Optimizing queries in a compile-to-SQL language basically has to be left to the compiler, which may not have the context needed to write performant SQL.

        Compiling a query is different than compiling a whole program to assembly, because the query makes assumptions about schemas and indexes that cannot be encapsulated in what you're feeding to the compiler at that moment.

        • ttfkam 2 years ago

          Agreed.

          Also, someone who ISN'T good at SQL can look at EXPLAIN output and see where the query planner HAS gone wrong.

          Adding PRQL to the mix unambiguously makes that analysis and optimization step harder.

        • maximilianroos 2 years ago

          So I agree that unperformant SQL is unperformant, but PRQL can _reduce_ the chance of making mistakes there.

          Here's an example [1] of someone reporting that a query engine was far more performant with one SQL construction, and then PRQL changing the SQL we output to use that construction.

          GCC & Clang are much better at compiling to assembly than any person! PRQL isn't there yet, but each improvement scales to everyone who uses it.

          [1]: https://github.com/PRQL/prql/issues/2182

          [Disclaimer: PRQL dev]

          • ttfkam 2 years ago

            Be honest. C to object code to linker to final binary are a MUCH bigger leap than PRQL to SQL.

            You make it sound like SQL is some insurmountable hurdle while PRQL is a bunny slope. You're not getting anywhere with that nonsense.

            [Disclaimer: not a PRQL dev]

            • maximilianroos 2 years ago

              I don't know where this aggression is coming from — we're an open-source project, only volunteers, trying to do something that makes it easier for folks to work with data.

              Critical & constructive feedback would be really appreciated, but that's not this — why all the snark?

              • ttfkam 2 years ago

                Agreed. Got off on the wrong foot. I and others have noted what appeared to be overzealous marketing, and I previously asked questions regarding DB engine-specific features like jsonpath, unnest for arrays, etc.

                The responses from the PQRL appeared evasive to me, asking for PRs to address obvious syntax mismatches, and never answering (what I believed to be) substantive questions regarding extensibility and lowest common SQL denominators. "We don't support that yet," is a perfectly reasonable response. But what I'm seeing now is just youthful exuberance untempered by abundance of experience.

                Almost anyone working intimately with DBs beyond just MySQL 5.x will know that each DB engine has features that can reduce the amount of SQL required, improve data integrity, substantially speed up operations, etc. Not allowing for escape hatches to native functionality is often a deal breaker. Does PRQL have "escape hatches" where you can use the vendor-specific keywords, operators, and types of the underlying engine? (Not just calling single-return-value function, but actual substantive syntactic differences.)

                The issue of NULL is a tricky one, and I totally see why folks would want to == and != it away, but unfortunately in the SQL world, it's not just a matter of IS NULL vs IS NOT NULL (leaving aside IS [NOT] DISTINCT FROM). NULL has profound effects in the model stemming from its VERY different meaning from most programming languages. NULL means "unknown" in SQL, not just "no value" as it is in most general purpose languages. It's subtle but absolutely can't be hand-waved away. How does PRQL handle "IS DISTINCT FROM"? What about set-returning functions that were passed a NULL parameter?

                Which brings me to why I think I was harsher than intended: I don't think any of you actually like SQL, and I think that's a dangerous place to start from. It leaves "weird" NULL behavior cascading through your query while the user has used == and != just as they always have with any other value. Using EXPLAIN becomes more difficult due to indirection, and it's not clear to me how you'd alter the query without hacking the PRQL engine in a one-off. I fear that it is indeed "lowest common denominator" (which these days typically means MySQL 5.x compatibility), which is (in my personal opinion) wholly unacceptable for a non-trivial data access conduit.

                I'd likely have been much less snarky and supportive of PRQL if it was presented as an idea factory examining the shortcomings of SQL—of which there are many—and trying out solutions on an experimental basis.

                But that isn't the tone of the project at all. Its messaging (intended or not) goes at SQL with both barrels while touting itself as a production-ready alternative standard to SQL. It isn't about "trying it out for analytics to see if it's right for your team," which I could totally get behind.

                Instead, your site has statements like these:

                • PRQL is a stable foundation to build on; we’re open-source and will never have a commercial product

                • PRQL is a single secular standard which tools can target

                A lack of commercial support isn't in any way synonymous with a stable foundation. Stability is like trust: it is earned over time through action, not just asserted to be true with good intentions. I'm not saying you can't get there eventually, but PRQL is just a year old! The project and team simply haven't earned the right to assert that yet!

                https://xkcd.com/927/

                All implementations are standards until they diverge. Calling PRQL a standard at this point is like the first baseball World Series where only the US played (and just white men from the US at that, but I digress). I'm not saying it can't ever be a standard, but you're skipping several steps in that process and already declaring, "Done!"

                To my knowledge (and quick internet searches) there is no major standards organization behind it. Is there even more than one implementation of this standard? (Usable in multiple programming languages is not the same thing.)

                I appreciate that you're volunteers and an open source project. I really do. I also appreciate that your hearts are in the right place with regard to making data accessible. But this isn't about snark. You're trying to bite off a huge chunk of a 49-year-old leviathan without recognizing how it got to be almost half a century old (in the computer industry!!!) and without noticing that massive leviathan is covered in a long lifetime of battle scars. Those are the waters you're trying to swim in (or at least seeming to from your public website messaging and source repository).

                    "Those who don't understand UNIX are doomed to reinvent it, poorly." – Henry Spencer (1987)
                
                This is not just an idle piece of snark from Unix folks to any upstart that crossed their paths. It was already a large, battle-scarred leviathan as well by that point, and is far more so today!

                The same totally applies to SQL. Show some respect to that, keep up the good fight, but don't let hubris cloud your judgement about how far you and your team really are in this journey.

                Other commenters in this discussion have noted inconsistencies in PRQL syntax. Those conversations won't get any easier by stating you'll never address them because it's "a stable foundation" or making breaking changes to make things better, thereby breaking that promise. Your project is experimental. Own it. Run with it. Improve on it. But if it's presented as more than this—as it is now—expect vehement pushback from both haters and other who honestly hold you no ill will but recognize several missing fundamentals as I have.

                • maximilianroos 2 years ago

                  I appreciate the lengthy response.

                  I won't address every point here, but a couple of thoughts:

                  - To close the syntax point — I opened https://github.com/PRQL/prql/issues/3518. We're currently using the output of the rust's canonical SQL formatter. (so any claim we're obfuscating SQL's syntax is incredulous — do you agree?). I'm open to hand-written examples if SQL doesn't have sufficient auto-formatting tools available.

                  - "never answering (what I believed to be) substantive questions regarding extensibility and lowest common SQL denominators" — some of the features are implemented, some aren't, there are issues on GH for by-and-large all of them. Feel free to open other issues. I don't think there's a duty to respond to every question in every comment on HN, and I certainly don't think losing your composure is an appropriate response to others not answering every question.

                  - Many of the answers to your questions are in the docs — for example escape hatches. You don't have to read them to engage, but claims about a project's "youthful exuberance" are less credulous from those who haven't done so.

                  - Stepping back, are these are informed & constructive criticisms, or are they grasps for straw-men as part of a reactionary response? For example, the response takes two phrases from the website out-of-context to load a whole argument on the words "stable" & "standard". But stability & standards can each mean multiple things, and taking the least generous interpretation of a word doesn't make for a reasoned critique. Does a point such as:

                      touting itself as a production-ready alternative standard to SQL
                  
                  ...have any grounding in fact? Or does it come through this aggrieved reactionary lens? For context, PRQL's Readme specifically states:

                      PRQL still has some bugs and some missing features, and is probably only ready to be rolled out to non-technical teams for fairly simple queries.
                  • ttfkam 2 years ago

                    > For context, PRQL's Readme specifically states

                    Be honest. If a different project touted itself as a standard and a firm foundation on its public website but had this disclaimer on the source repo's README file, what would your thoughts be?

                    While you may categorize my responses as reactionary—and they very well may be—can you really claim unbiased objectivity on your part?

                    For example, JS and C# have ECMA. SQL, C, and C++ have ISO. POSIX has IEEE. All have multiple implementations by different organizations. The term "standard" has a clear, well-defined meaning in computing with a long history. Your response was to handwave it away as "meaning multiple things." Whether de facto or de jure, the appellation in PRQL's case simply does not apply.

                    I appreciate that you are not obligated to read and respond to every internet comment. When you do respond however, bear in mind that dismissing clearly defined definitions of industry terms like "stable" and "standard" is not an unbiased position.

                    I am not aggrieved. You asked for critical and constructive feedback. I laid out plainly where I believed the messaging was incongruous from the point of view of someone not intimately involved with PRQL and how I felt it should be changed to better fit its role in the data ecosystem. I didn't call for the project's elimination or even call into question the need for improvement over the existing state of SQL. Take of that what you will.

      • ttfkam 2 years ago

        (I couldn't help but notice you didn't comment on the difference in formatting in the examples.)

        Do you have examples of PRQL working with jsonpath? Generating JSON? Unnesting arrays? Returning ids from an INSERT or UPDATE without making a separate read query?

        Not trying to be argumentative. Honest question.

        • remram 2 years ago

          I am not affiliated in any way with the PRQL project. Those are great questions though, I hope we get an answer.

        • snthpy 2 years ago

          Thanks, that's a great question. You're right in that so far we haven't highlighted working with JSON and I hadn't actually tried until this point. IMHO the true power of PRQL comes from the fact that it allows you to define functions and with that you get the power of composability which is the true power of almost every programming language (and which is for the most part completely lacking in SQL).

          So with that said, I tried the following POC (remember that PRQL is just a SQL generator so the JSON capabilities depend on your underlying RDBMS):

              ```sh
              > prqlc compile <<EOF
              let get = path obj -> s"""{obj} -> {path}"""
              let getstr = path obj -> s"""{obj} ->> {path}"""
              let extract = obj path -> s"""json_extract({obj}, {path})"""
              
              from [{data='{"duck": [1, 2, 3]}'}]
              select { data | get '$.duck[0]', data | getstr '$.duck[1]', extract data '$.duck[2]' }
              EOF
              WITH table_0 AS (
                SELECT
                  '{"duck": [1, 2, 3]}' AS data
              )
              SELECT
                data -> '$.duck[0]',
                data ->> '$.duck[1]',
                json_extract(data, '$.duck[2]')
              FROM
                table_0
              
              -- Generated by PRQL compiler version:0.9.4 (https://prql-lang.org)
              ```
          
          What's going on here is that I used [s-strings](https://prql-lang.org/book/reference/syntax/s-strings.html) to define custom PRQL functions `get`, `getstr` and `extract` which translate into the underlying `->`, `->>` and `json_extract` SQL constructs.

          You could then for example pipe that query to DuckDB (the example is taken from the following DuckDB blogpost [Shredding Deeply Nested JSON, One Vector at a Time](https://duckdb.org/2023/03/03/json.html)):

              ```sh
              > prqlc compile <<EOF - | duckdb
              let get = path obj -> s"""{obj} -> {path}"""
              let getstr = path obj -> s"""{obj} ->> {path}"""
              let extract = obj path -> s"""json_extract({obj}, {path})"""
              
              from [{data='{"duck": [1, 2, 3]}'}]
              select { data | get '$.duck[0]', data | getstr '$.duck[1]', extract data '$.duck[2]'}
              EOF
              ┌───────────────────────┬──────────────────────────┬───────────────────────────────────┐
              │ "data" -> '$.duck[0]' │ ("data" ->> '$.duck[1]') │ json_extract("data", '$.duck[2]') │
              │         json          │         varchar          │               json                │
              ├───────────────────────┼──────────────────────────┼───────────────────────────────────┤
              │ 1                     │ 2                        │ 3                                 │
              └───────────────────────┴──────────────────────────┴───────────────────────────────────┘
              ```
          
          HTH
  • snthpy 2 years ago

    For those interested who want to learn more, we have a number of presentations coming up at conferences on three continents:

    - [QCon SF, October 2nd, San Francisco, USA: ](https://qconsf.com/presentation/oct2023/prql-simple-powerful...)

    - [PyconZA, October 5th, Durban, South Africa: ](https://za.pycon.org/)

    - [Community over Code (ApacheCon), October 9th, Halifax, Canada: ](https://communityovercode.org/schedule-list/#FT005)

    - [data2day, October 12th, Karlsruhe, Germany: ](https://www.data2day.de/veranstaltung-21353-0-prql-a-modern-...)

  • lolinder 2 years ago

    SQL will never die for the same reason that JavaScript will never die: because it's built in to all major database engines.

    In both cases, any other language will be starting as a second class citizen that has to compile to SQL/JS. During this phase of a new language's lifetime, it is either a surface-level syntactic change (a la Coffeescript) that provides no objective improvement, or it has to compile its simple semantic structures into opaque SQL/JS structures that will be off the beaten path and therefore not highly optimized by the runtime. Neither will reach sufficient adoption to become a first-class citizen in a major existing platform.

    TypeScript succeeded where others failed because it provided much-needed static analysis while keeping the changes minimal enough that it's completely obvious what the runtime code will look like, so there are no unexpected performance gotchas. SQL, on the other hand, doesn't really need a TypeScript because SQL is highly statically analyzable by nature.

    It's not that I don't believe we could do with an improvement on SQL, but I really don't see a realistic path forward for a replacement.

    • totalhack 2 years ago

      IMO a semantic layer is a nice UX/DX improvement over plain SQL in a business/analytics setting. I use a semantic layer* for >95% of use cases and fall back to SQL when needed. This balance will be different for each business of course.

      * https://github.com/totalhack/zillion

    • paulddraper 2 years ago

      While I mostly agree, there is a bit of Stockholms syndrome.

      A lot of people don't know what they even could be missing.

      For example, there is no succinct way of writing an antijoin in SQL .

      The MERGE command has only been implemented by some engines due to (IIRC) concurrency concerns/ambiguities.

      ANSI SQL JSON operations have improved but are still clunky.

      Boolean NULL and IN is a clusterf of footguns.

      Etc.

      • lolinder 2 years ago

        Oh, I agree! SQL is far from optimal, as is JS. My thoughts are more about the feasibility of a replacement than the need for one.

        US electrical outlets are also highly flawed, but we're never going to replace them either. Standards, once entrenched, are nearly impossible to uproot.

        • Zak 2 years ago

          I like US electrical outlets. They make for small, foldable plugs.

          Yes, it's easier to shock one's self than with most other outlet designs, but the consequences of that are usually mild and help to instill a healthy respect for electricity.

    • maximilianroos 2 years ago

      [PRQL dev here]

      I agree with the sentiments, even if not the conclusion. SQL is omnipresent and is "fine" in a lot of cases.

      TypeScript is indeed a great example of the case; Kotlin too. I'd also add that databases are already adding PRQL support — ClickHouse has native support, there's a DuckDB extension, and folks are working on a Postgres extension.

      One thing I'll respectfully disagree with — "SQL is highly statically analyzable by nature":

      As a really basic example: `SELECT <expr> FROM tbl` — can we tell what shape the result is? In SQL, shapes / types require a lot of context — the result could be a single row in the case of `SUM(foo)`, or it could be every row in the case of `foo, bar`. More in https://prql-lang.org/faq/...

      • dragonwriter 2 years ago

        > As a really basic example: `SELECT <expr> FROM tbl` — can we tell what shape the result is?

        Like most statically analyzable code, you have problems if you try to statically analyze part of the code without the relevant definitions.

        But, yes, if you have the relevant code (e.g., the DDL for the table), you can. (Without it, you can in the sense that you can statically determine it as a function of the table definition, which can be sufficient in some cases.)

        > in SQL, shapes / types require a lot of context — the result could be a single row in the case of `SUM(foo)`, or it could be every row in the case of `foo, bar`.

        Oh, you mean, can we statically determine the shape of an expression’s results without knowing the expression? Well, no, and that’s true in most statically-analyzable languages.

      • lolinder 2 years ago

        Can you tell the shape of the result of `<expr>()` in Java by just reading it? No. Does that mean that Java can't be statically analyzed? Of course not!

        A static analysis system is not restricted to weird abstract constructs like `select <expr>`, it sees the complete picture and can come to conclusions based on the concrete code construct it's given. There's absolutely nothing stopping a SQL static analysis from recognizing that `SELECT sum(foo)` will always return one row with one column that is an integer type, while `SELECT foo, bar` returns some number of rows that have foo and bar columns whose types can be inferred from the CREATE TABLE statements.

        • maximilianroos 2 years ago

          Again — we agree — with omniscience, it's possible to statically analyze everything. My claim is some languages can be statically analyzed with much less context than others.

          Check out "Expression substitution" at https://www.scattered-thoughts.net/writing/against-sql/ if you're interested in more here.

        • Too 2 years ago

          Having some static constraints around <expr> makes it a lot easier to analyze though. Both for humans and compilers.

          Otherwise you end up with situations like C++ templates, that can expand to anything. With long confusing errors, unable to distinguish if it’s an error in the usage, the definition or the call-site.

      • chrisjc 2 years ago

        > One thing I'll respectfully disagree with — "SQL is highly statically analyzable by nature"

        Are you suggesting that PRQL is capable of this? Or at least easier to do in PRQL?

        • maximilianroos 2 years ago

          Yes, much easier.

          Check out "What’s this aggregate function?" at https://prql-lang.org/faq/. Without much context, we can understand the shape of a result.

          And because queries can be longer without becoming unreadable, the lineage information is richer.

      • edgyquant 2 years ago

        If you know <expr> then yes you know the shape of the data. You may not know the number of rows, but I fail to see how that makes it not analyzable.

        • maximilianroos 2 years ago

          Quoting from https://www.scattered-thoughts.net/writing/against-sql/:

          ---

          There are many cases where a small change to a computation requires totally changing the structure of the query, but subqueries are my favourite because they're the most obvious way to express many queries and yet also provide so many cliffs to fall off.

              -- for each manager, find their employee with the highest salary
              > select
              >   manager.name,
              >   (select employee.name
              >    from employee
              >    where employee.manager = manager.name
              >    order by employee.salary desc
              >    limit 1)
              > from manager;
               name  | name
              -------+------
               alice | bob
          
          ---

          If the inner query has more than one row, the query will raise an error. That's difficult to know from the SQL alone.

          • lolinder 2 years ago

            Difficult for the author, but not for static analysis. A static analysis system can pretty easily enforce that `limit 1` be set when a query is used in that position.

    • anon84873628 2 years ago

      Well, Malloy is developed within Google by the founder of Looker, so there is a chance it could be natively integrated into BigQuery. At that point you have a next gen SQL replacement available on one of the most widely used analytics and transformation engines.

    • Scarbutt 2 years ago

      It's hard to say, if microsoft or google where behind prql and promoting it, it may as well become a typescript. There's a reason many developers use query builders, while not exactly the same, they want programming language features and familiarity of modern programming languages. Not saying prql is the correct approach here since I don't know it.

  • frogulis 2 years ago

    Pretty cool, your description got my click. I particularly enjoy that a filter is a filter before and after grouping.

    One thing, the "showcase" section is not usable for me on mobile. The code box does not fit on the screen horizontally and I can't scroll right to see the remainder of it.

    • snthpy 2 years ago

      Thank you for the feedback. I'll let the team know.

      We definitely want people on all devices to be able to learn about the project.

  • ledauphin 2 years ago

    I've been excited in the abstract about PRQL for quite a while. But something FQL seems to have a much better handle on is the value of document-orientedness, or what you might alternatively call "gradual schematization".

    This problem has been solved (if not beautifully, at least acceptably) by modern SQL databases that support a JSON storage format and associated "secondary query language".

    I know PRQL has had an open issue on this subject for a while. I just want to note that I think this is truly one of the critical "missing pieces" to PRQL, without which it may never be able to break out into common usage.

    • anon84873628 2 years ago

      FQL is interesting because it focuses on transactional systems and eliminating the need for an ORM in applications. I feel many of the SQL replacement projects like PRQL and Malloy instead come from the analytics side of the house, which doesn't really help application developers at all. (But does raise the question, how do I do analytics in Fauna? Do I ETL to a traditional warehouse system?)

      • panda888888 2 years ago

        That's a good question. If you need to ETL to a traditional data warehouse in order to do analytics, wouldn't you need an ORM, which is exactly what they're trying to avoid? (Also note that it's usually ELT instead of ETL these days, like if you're using Snowflake)

        Or is Fauna a hybrid transactional/analytical database so you can do analytics in it using FQL? (Maybe the long-awaited possibility of true hybrid databases is real?!) But then you'd need to train all of your business intelligence analysts on FQL, which would be a drawback.

  • roenxi 2 years ago

    Great project, wish you all the best. Anything to try and unseat SQL from common use (we can all wish for the day we run PostgrespostSQL in production). At the moment the project is probably going to lose people because it isn't obvious how to get started - many SQL beginners don't know what a compiler is and will get confused by the docs.

    For the sake of their sanity, it'd be worth considering putting an example of using the compiler on a local text file somewhere prominent on that site. That way beginners can go in, write some PSQL, compile it and use it against real SQL databases.

    Or if not the compiler, make it clear how beginners are supposed to engage with this. There is a big need out there for something dplyr-like that works. There are a dizzying array of options and that isn't going to help some good people who need a bit of handholding.

    • snthpy 2 years ago

      Thank you for your feedback. That's really valuable!

      We have the [PRQL Playground](https://prql-lang.org/playground/) exactly for that purpose.

      We'll try and make it more prominent on the front page. I've also felt that we should have a "Getting started" page and will push that as a priority.

      • roenxi 2 years ago

        Ah, I see. Maybe you're just losing beginners who NoScript then. I hadn't figured out what the Playground was supposed to do but it turned out there was an extra disabled script.

        Nevermind, thanks.

  • psacawa 2 years ago

    Is there any intention of eventually supporting DML or DDL statements? That's when the COBOL-like nature of SQL syntax is most frustrating. For example, in order to run "ALTER COLUMN ..." I have to parse a ridiculous BNF like this[0] almost every time. I'll never remember it.

    Usually, the error is a gotcha built into the language syntax (e.g. forgot the keyword "TO").

    [0] https://www.postgresql.org/docs/13/sql-altertable.html

  • seanhunter 2 years ago

    SQL is based on relational calculus rather than relational algebra, which is why it's declarative. Relational algebra is built on fundamental relational operators (select, project, filter, product etc) which are imperatively applied. You can find out more about it here https://techdifferences.com/difference-between-relational-al...

  • jug 2 years ago

    At least superficially this looks a lot like C# LINQ to me in terms of structure and database independence (as for EF Core + LINQ). It’s in my top 3 features of that language.

    https://www.tutorialsteacher.com/linq/sample-linq-queries

    Edit: Shortened to link due to formatting issues

  • danielvaughn 2 years ago

    That first PRQL code sample is wonderfully readable.

    • Timon3 2 years ago

      It is! One suggestion to make it even more convincing: I'd love to see the SQL statement it compiles to.

      • leeuw01 2 years ago

        This +1. I've been burned before when using ORMs which translate simple-looking queries to terribly inneficient SQL statements underwater.

        • danielvaughn 2 years ago

          Yeah I’m a web dev, and recently I found out the most popular JS ORM doesn’t produce joins. It’ll just execute multiple queries in sequence. I don’t know how common that is in the ORM landscape but for me that’s a deal breaker.

          • setr 2 years ago

            That’s just a direct result of lazy loading — if you don’t grab the related objects in the initial query, then there’s no joins to be had.

            Most ORMs do lazy loading by default, but also have a way of doing eager loading — either requiring the nested object to always be loaded as well, or dropping down to some pseudo-sql.

            In c#/EFCore, I always prefer to avoid lazy loading and just write LINQ, and just use the ORM to map the resultset back to objects

          • smaddox 2 years ago

            Common enough to have a name: the N+1 query problem.

            • Izkata 2 years ago

              1+N is clearer I think, and what I remember seeing in the past - it matches what's actually happening with this problem. First time I saw N+1 sometime around a year ago, I had no idea it referred to the same thing and thought it was something different.

              N+1 looks like an issue with aggregation after a parallel run, something I've encountered with celery tasks before.

      • Timon3 2 years ago

        For reference, this is the output:

            WITH table_1 AS (
              SELECT
                customer_id,
                total,
                total - 0.8 AS _expr_0
              FROM
                invoices
              WHERE
                invoice_date >= DATE '1970-01-16'
            ),
            table_0 AS (
              SELECT
                COALESCE(SUM(_expr_0), 0) AS sum_income,
                customer_id
              FROM
                table_1
              WHERE
                _expr_0 > 1
              GROUP BY
                customer_id
              ORDER BY
                sum_income DESC
              LIMIT
                10
            )
            SELECT
              c.customer_id,
              CONCAT(c.last_name, ', ', c.first_name) AS name,
              table_0.sum_income,
              version() AS db_version
            FROM
              table_0
              JOIN customers AS c ON table_0.customer_id = c.customer_id
            ORDER BY
              table_0.sum_income DESC
    • snthpy 2 years ago

      Awesome! That's what we're hoping for. Great to hear that you find it wonderfully readable!

  • raverbashing 2 years ago

    > we believe that SQL is a combination of two things:

    > 1. Relational Algebra, which is eternal because it's just maths, and 2. A language designed in the 70s that looks like COBOL.

    Your belief is as real as my belief that it rains too much in London ;) (that is, it is correct)

    But why people have such hold on to such a quirky syntax beats me

  • iamcreasy 2 years ago

    How would you compare prql with dbt?

    • snthpy 2 years ago

      dbt integration was one of our major goals early on but we found that the interaction wasn't as straightforward as we had hoped.

      There is an open PR in the dbt repo: https://github.com/dbt-labs/dbt-core/pull/5982#issuecomment-...

      I have some ideas about future directions in this space where I believe PRQL could really shine. I will only be able to write those down in a couple of hours. I think this could be a really exciting direction for the project to grow into if anyone would like to collaborate and contribute!

    • anon84873628 2 years ago

      dbt is just an orchestration tool. It uses SQL because that's what you need to pass to the target database. There is a python plugin if you prefer to use that for your models instead. Theoretically dbt could wrap any language your target system accepts. The actual configuration of the dbt runtime itself is done with yaml files.

  • deburo 2 years ago

    That looks awesome. Does it support directly querying against databases (PostgreSQL, SQL Server, ...)? ie. is there a "Run" command in vscode that takes care of compiling & running the compiled sql?

  • hiAndrewQuinn 2 years ago

    This is tremendous. I'm curious to know if a CLI `prqlite3` exists which wraps around the `sqlite3` CLI many of us know and love.

    • snthpy 2 years ago

      Thank you.

      The CLI usability was one of the aims behind [prql-query (pq)](https://github.com/prql/prql-query/). sqlite integration was on the roadmap but unfortunately that project has been largely unmaintained by me for the past 6 months. (This is just referring to prql-query and not PRQL which is under very active development.)

      I'm working on a new project which will do exactly this (and a lot more!) which I hope to release next week. I'll drop the link here when that's ready.

  • spion 2 years ago

    How is the language server support?

  • continuational 2 years ago

    In that first example, is the last line superfluous? It doesn't seem to be used.

  • laerus 2 years ago

    hey, is compile time verification of queries supported for PRQL in Rust?

    • snthpy 2 years ago

      Not yet, but looking at what sqlx does, I think we should be able to do something similar.

      It's been a small team of core contributors so far but in the last three months we've seen more people making their first PR and then going on to contribute more over time so the momentum is growing.

      We'd definitely be open to contributions in this space.

dagss 2 years ago

Nitpick, but relational does not mean joins, it means tables/rows of tuples. A "relational document database" which is the slogan of Fauna it seems is a contradiction in terms.

  • contrast 2 years ago

    That’s technically correct, and I think the author would say he’s aware of that definition.

    The article as I read it is trying to make a broader point, that there are underlying mathematical principles that inspired Codd’s relational model.

    I’ve never had cause to explore it, but my understanding is that there’s nothing in those principles that require tables/rows of tuples.

    One goal of the article seems to be to inspire a curiosity in knowledgeable readers: what happens if you build a document database that also supports the same mathematical principles that inspired the relational model?

    • gregjor 2 years ago

      > there’s nothing in [Codd’s] principles that require tables/rows of tuples.

      Have you read Codd’s Rules #1 and #2? Pretty clear on this point.

      https://en.wikipedia.org/wiki/Codd%27s_12_rules

      Technically the relational model uses the term relation to refer to an unordered set of tuples, where every tuple has a key (one or more elements) to uniquely identify it, and every tuple has the same number of items, of the same type. Tables are relations. So are the results of a query, which can include joins.

      • ghusbands 2 years ago

        > > there’s nothing in [Codd’s] principles that require tables/rows of tuples.

        You're misquoting; is that intentional? The post you're replying to says the principles that inspired/predate Codd don't require tables/tuples. Thus, the details of Codd's relational model are irrelevant.

    • dragonwriter 2 years ago

      > The article as I read it is trying to make a broader point, that there are underlying mathematical principles that inspired Codd’s relational model.

      The relational model is a direct product of a set of mathematical principles Codd put together called relational algebra, which deals with sets of tuples called relations.

      Nothing in the article addresses any of the mathematical underpinnings of the relational model. Its blowing smoke at an audience that it expects to know next to nothing about the topic.

      > One goal of the article seems to be to inspire a curiosity in knowledgeable readers: what happens if you build a document database that also supports the same mathematical principles that inspired the relational model

      The features of RDBMSs that they seem to be suggesting FQL supports are ACID transactions. While that's an important feature of RDBMSs, it isn’t the same thing as the mathematical principles addresses by the relational model, whether relational algebra or the more general set theory that inspires it. The article isn't directed at knowledgable readers.

    • bazoom42 2 years ago

      A relation is by definition a set of tuples (informally called a table where the tuples are the rows).

      Codds relational database model adds the further constraint that nested tables are not allowed (first normal form), instead representing relationships through foreign keys.

      Codds motivation for disallowing nested tables is that it makes query languages much simpler. He develops relational algebra which is the foundation behind SQL, which is why SQL does not allow nested tables.

      Document databases does not follow first normal form and allows nested structures, so they cannot be queried with relational algebra, since it doesnt have a way to “drill down” into nested structures.

      It is unclear to me what “mathematical principles” remain if you remove the notion of relations from the relational model.

      • _a_a_a_ 2 years ago

        Out of my depth here (no experience) but "Codds relational database model adds the further constraint that nested tables are not allowed" may be wrong. He allowed nested stuff, it's just that SQL didn't support it.

        Can anyone elucidate? Please don't shout that I'm wrong because there was something there in his first paper.

        • bazoom42 2 years ago

          No, he explicitly disallows nested relations. This is the definition of first normal form.

          Hierarchical databases (which predate relational) can be understood as nested relations, and Codds first example of normalization is how to extract the nested relations in such a database into seperate tables and instead express the relationships through foreign keys.

          • jimwhite42 2 years ago

            Date and Darwen (spritual successors to Codd?), have a relational theory which allows nested relation values and I think it doesn't contradict the motivation behind first normal form, and it isn't hierarchical in the pre-relational database sense. I think they cleaned up Codd's ideas very effectively.

            • bazoom42 2 years ago

              The motivation behind first normal form is to keep the query language simple and powerful at the same time. Allowing nested tables would require extensions to the query language but wouldn’t give any additional expressive power since you can already express the same relationships using foreign keys.

              Also I dont see how it isn’t hierachical? Nested tables create a hierachical structure just like nested records in a hierachical database. What is the difference?

              I consider Date and Darwen more like fan-fiction writers than spiritual sucessors. It is fine they propose a modified version of the relational model, but they are doing it in a weird way where they try to redefine Codds words to mean the opposite of what he is saying, instead of just noting how they disagree. Like if Codd was a prophet whose words cannot be directly contradicted.

              • jimwhite42 2 years ago

                > Also I dont see how it isn’t hierachical? Nested tables create a hierachical structure just like nested records in a hierachical database. What is the difference?

                I haven't seen any reasonable uses of nested relations in base relvars, I think they are slightly different to nested records if you can query them declaratively instead of via imperative looking nested loop code.

                But nested tables can be used in queries - the group and ungroup operators are pretty useful. It's a straightforward way to define (and even implement) group by aggregates for instance.

                > Allowing nested tables would require extensions to the query language but wouldn’t give any additional expressive power since you can already express the same relationships using foreign keys.

                Everything is a tradeoff. There are lots of queries that you can write using correlated subqueries in SQL, but I think would be much simpler if you could use group and ungroup.

                I don't really follow your last paragraph at all. I would much rather use Date and Darwen's relational theory than Codd's. I think they have the right amount of respect for Codd, and definitely aren't shy to highlight where they differ from him, or to contradict him.

                Do you have an example of them trying to redefine something in the way you say?

                • bazoom42 2 years ago

                  For example Date redefines first normal form to allow nested tables, while in Codds paper it is defined as eliminating nested tables. Why don’t they just give their concept a different name then? It just creates confusion.

                  • jimwhite42 2 years ago

                    Can you say where he does this? I only remember him referring to first normal form as something historical, that he doesn't use that particular normal form directly at all.

              • ttfkam 2 years ago

                It's not 1992 anymore. https://youtu.be/8Fb5Qgpr03g?si=Jc7dpuVgws3POiXA

                It may upset someone's sense of "purity", but the SQL standard and thoughts about the relational model have long since moved on from this dogmatic view of atoms, and no, foreign keys can't perfectly model what folks need. Eventually the rubber hits the road, theory and practice diverge, and different approaches are clearly needed.

                There is a place for a document atom in a database (JSON, XML, etc. that don't map well or efficiently to a traditional relational model). For arrays. (Putting in a separate table with its own PK to join is somehow more flexible? Just use unnest(…) for that.) For label trees. For geometries (GIS). For references to tables on other servers (SQL/MED). For tuples. For UUIDs. For ranges. For intervals.

                You may call Date and Darwen's work fan fiction, but I have seen tremendous efficiency gains over the last few decades from engines drawing upon their insights without compromising data integrity. In fact, exclusion constraints with timestamp ranges is one area where the old strict relational model would be wholly ineffective for data integrity and avoiding race conditions cascading out to the application level.

                https://www.cybertec-postgresql.com/en/postgresql-exclusion-...

                • bazoom42 2 years ago

                  I belive in using the right tool for the job, and this might somtimes be document databases, key-value stores, xml or json values or whatever is needed to solve the problem.

                  But let me point out that hierachical databases (which document databases, xml etc are variants of) predate the relational model (and certainly predate Dates work) so it it worth to be aware of the challenges and limitations which caused the development of the relational model as an alternative.

                  • ttfkam 2 years ago

                    > I belive in using the right tool for the job

                    And more of these tools (RDBMSs) are able to perform more jobs. There's definitely use cases for correlating parts of documents with traditional relational data. Quite often there's no need to choose between a relational database engine and a document database engine when the engines support both and more.

                    Hybrids tend to blur the lines for definitions.

          • _a_a_a_ 2 years ago

            Thanks for a polite disagreement, but I believe you are wrong (not saying you are!). IIRC Codd defined relation valued attributes and also associated operators Group and Ungroup. https://www.oreilly.com/library/view/sql-and-relational/9781...

            also https://shark.armchair.mb.ca/~erwin/RA_Intro.htm

            "

            Relations are, themselves, values too, and relation attributes can therefore be declared to be of another relation type. Such attributes are called 'Relation-valued attributes' (RVA's for short).

            In the RA, two operators are available that allow us to manipulate relations in connection with RVA's : GROUP and UNGROUP

            "

            Like I said, I'm a bit out of my depth here so take the above as evidence rather than proof that such things existed, but I'm pretty sure I saw this, hand-drawn, in one of Codd's original papers.

            .

            Edit: you are right

            "Codd proposed a normal form thathe called first normal form (1NF), and he included a requirement for 1NF in his definitions for 2NF,3NF, and subsequently BCNF. Under 1NF as he defined it, relation-valued attributes were “outlawed”;that is to say, a relvar having such an attribute was not in 1NF."

            https://fliphtml5.com/qprz/cxon/basic/201-235

            • ogogmad 2 years ago

              No, it doesn't mean he's right. The "normal forms" could merely be suggestions for a database designer, not a technical limitation enforced by the software itself.

              No one has provided convincing evidence that Codd intended to exclude nested tables entirely. People seem to be conflating i) good database design, as suggested by Codd ii) the feature-set of a DBMS, also as suggested by Codd.

              • jimwhite42 2 years ago

                > The "normal forms" could merely be suggestions for a database designer, not a technical limitation enforced by the software itself.

                I think most of the motivation for normal forms is to avoid 'update anomalies', which is essentially, don't represent the same information in two places in your base relation variables (aka tables in SQL). So you can have repeated values or nested relations in queries, and you can have them in base tables which are morally normalized, as long as there's no possibility that these lead to the same information being recorded in two distinct places.

                When people talk about 'denormalizing' and it's justified, I think it's breaking this rule about representing information in two or more places in exchange for performance. If you do this, the application programmer has to be careful to keep these multiple locations in sync - a kind of consistency you don't have to think about in a clean database design. I think that database management software in general cannot enforce normalisation - it can only make it easier or more difficult to use it with normalized databases.

                In theory, the DBMS itself could directly support 'physical denormalization' and make this performance optimisation easier to implement and transparent to the application code. I think some SQL DBMSs have attempted to do things like this.

                • 6345dhjdsf 2 years ago

                  (Posted under a different account because I'm being slow-posted again by HN)

                  > In theory, the DBMS itself could directly support 'physical denormalization' and make this performance optimisation easier to implement and transparent to the application code. I think some SQL DBMSs have attempted to do things like this.

                  Automatically managed, application-transparent, physical denormalisation entirely managed by the database is something I am very, very interested in. Unfortunately I've been able to find pretty well nothing to describe what it would look like and how it would be done. If you can provide any links that would be so incredibly helpful!

                  It gets mentioned in the Date/Darwen books as being the right way to do things, but no actual information seems to be given.

                  • jimwhite42 2 years ago

                    I'm a bit fuzzy, but I think Vertica allows duplicating tables stored in multiple orders - then I think the appropriate version is picked automatically by the query optimiser. So this works not that differently to an index (which is also dbms managed performance denormalization).

                    There's also materialized views - if you have automatic incrementally updated materialized views, which are transparently substituted into queries, that's along these lines. I think there's a lot of progress being made here, and plenty of compromises used in the field that have been in production for a long time.

                    I think there's some ambitious work on materialized views being done in postgres.

                    > It gets mentioned in the Date/Darwen books as being the right way to do things, but no actual information seems to be given.

                    I don't think they ever convincingly got into the details on it.

                    • 6345dhjdsf 2 years ago

                      > So this works not that differently to an index

                      Yes, it's pretty much the same as a covering index is used

                      > There's also materialized views

                      Ah yes, that's pretty much the answer (if incrementally updated). Thanks.

                  • smaddox 2 years ago

                    > Automatically managed, application-transparent, physical denormalisation entirely managed by the database is something I am very, very interested in.

                    Sounds a bit like Noria: https://github.com/mit-pdos/noria

                • bazoom42 2 years ago

                  > I think most of the motivation for normal forms is to avoid 'update anomalies', which is essentially, don't represent the same information in two places

                  This is true for the second and higher normal forms, but not for first normal form. First normal form is about eliminating nested tables, not about eliminating redundant data.

              • bazoom42 2 years ago

                > No one has provided convincing evidence that Codd intended to exclude nested tables entirely.

                See Codds original paper (linked in a sibling comment) section 1.4.

                Note that the relational algebra developed by Codd does not support querying nested tables, which would make them practically useless, even if allowed.

              • _a_a_a_ 2 years ago

                > No one has provided convincing evidence that Codd intended to exclude nested tables entirely

                Erm, my last para strongly suggests that he did?

                "Under 1NF as [Codd] defined it, relation-valued attributes were “outlawed”;that is to say, a relvar having such an attribute was not in 1NF."

                (but see @jimwhite42's comment)

            • bazoom42 2 years ago

              When in doubt, check the primary source: https://www.seas.upenn.edu/~zives/03f/cis550/codd.pdf

              See section 1.4 about eliminating “non-simple domains” (which means nested tables) through a process of normalization.

  • marcosdumay 2 years ago

    Whatever you want to point from theory, the one single distinctive feature of the relational model is the "mostly free" interdependency between the relations. AKA, the fks and joins.

crabbone 2 years ago

Worthless article. Zero useful description of what it's trying to sell. A bunch of disjoint historical facts about relational databases that have nothing to do with the product being sold take about 2/3 of the article.

Also the author seems to be very proud of associating themselves with Microsoft's products (w/o even a hint of doubt that that may not show them in favorable light)...

Also, marketing-inspired use of pseudo-programming terminology (eg. "dynamic languages"). Ewww.

ako 2 years ago

Seems like a lot of what fauna does by storing documents isn’t really new, oracle, Postgres and others have provided this for a long time. I was really surprised by the performance of json queries [1], opens the doors to using Postgres as a client api cache, storing the payload in a table, and doing deserialization using (materialized) views.

Difference seems to be the approach to minimize number of calls from your application, get all require session data in one call, similar to what graphql is doing for api calls. They’re also using http as the protocol for database connectivity.

[1]: https://ako.github.io/blog/2023/08/25/json-transformations.h...

  • default-kramer 2 years ago

    Using FQL instead of SQL seems to be a pretty big difference too.

    • ako 2 years ago

      Postgres has procedural languages and enables you to return complex json structures combined of relational data and json documents both with its procedural languages and regular sql. Sure, the syntax is different, but not sure if the difference makes a big impact.

  • robertlagrant 2 years ago

    Yes - the difference you mention seems to be the main difference.

gigatexal 2 years ago

Re PRQL … I see it like my text editor. I’ll stick with vi because it has solved text editing. It’s done. Same with SQL. I’ve not seen anything yet ready to replace it. It’s not perfect. But for what I need from it it’s perfectly serviceable.

robertlagrant 2 years ago

On strong schemas and flexibility:

1. You still have a schema in your code. With weak schemas it's now just harder to know if every record in your database conforms to it.

2. An ORM is a great tool for prototyping. R.g. have SQLAlchemy objects in code, run a command to generate a database migration; run the migration, and you have all your data guaranteed to be compatible with your latest code, and you didn't write any SQL.

  • m_mueller 2 years ago

    If you program defensively you can save on certain common Schema updates in e.g. a document based data model (e.g. adding more fields). But strong schemas definitely make sense when you’re dealing with relational data from my experience. Earlier in my career I built a relational model on top of CouchDB (due to its strong replication capabilities, including on mobile devices), but it was definitely painful (and less performant) compared to building it in a relational DB.

    • roenxi 2 years ago

      > If you program defensively you can save on certain common Schema updates in e.g. a document based data model (e.g. adding more fields)

      ALTER TABLE whatever ADD COLUMN new_field type DEFAULT NULL;

      I've seen a lot of people claim that they don't want to waste time clarifying their schema and I'm sure there are edge cases where that is clever. But, in the majority of cases, they are literally risking data integrity for a saving smaller than the time it takes to write a HN comment.

      Making schema implicit doesn't "save" anything. The schema is still there, now just only insiders who are completely familiar with the code know what it is. And they're going to have a few extra bugs because they'll forget too.

ghusbands 2 years ago

> Most importantly, SQL databases made supporting highly consistent ACID transactions easy.

The default transaction isolation level for every major database is not ACID. Enabling the required serializability tends to make performance terrible, and so most don't.

  • _a_a_a_ 2 years ago

    > Enabling the required serializability...

    is trivial, no?

    > ...tends to make performance terrible

    I've heard this a lot but never seen any figures - anyone have any numbers/experience?

    (edit: and most apps I've worked with didn't need serialisability, either because they were working with a snapshot of data or absolutely precise answers weren't needed)

    • ghusbands 2 years ago

      I've heard multiple accounts of people being taken by surprise by this and by how transaction isolation actually works in databases and not finding it at all easy to correct it. A famous one is https://blog.codinghorror.com/deadlocked/

      (On your edit: The problem is not knowing when you're being hit by it. Even just maintaining a limit on total size of uploaded files or such, for example, is nontrivial under default isolation levels.)

      • _a_a_a_ 2 years ago

        On your first para I'm not remotely sympathetic. They are using a complex, sophisticated tool without understanding it. They got what they deserved. Particularly telling is this ridiculous quote "I'm a little disappointed that SQL Server treats our silly little web app like it's a banking application". Edit: so, they wanted even more lax integrity constraints by default? FFS

        And, BTW, deadlocks can most definitely happen using the default Read Committed isolation level.

        Per your 2nd para, I simply don't understand, can you clarify?

        • ghusbands 2 years ago

          You're expecting a very high level of expertise for a tool that so many use incidentally. Most don't actually know how transaction isolation works, especially as the fine detail varies between databases. It's incredibly common to believe that databases actually are ACID by default and that is repeated far and wide.

          For the last, I was pointing out that people don't necessarily know when they hit race conditions or other transaction isolation faults; they're one of the hardest things to debug or notice. It would be easy to believe that a transaction that confirms a file update would not take a user over their quota and then updates the file would, per ACID, actually ensure the user doesn't exceed their quota. But the default transaction isolation levels don't provide that.

          • 6345dhjdsf 2 years ago

            You're actually right because casting my mind back, I can remember exactly the same kind of thing happening in places I worked for. I specifically remember discovering a race condition because they didn't understand about locking/trans iso levels.

            Nonetheless if you're going to base your company's product around a database or anything else for that matter, you really had better get yourself sorted out WRT your tools, or it can become horribly expensive to fix or even fatal to your company.

  • iudqnolq 2 years ago

    that's presumably why the author said "made supporting... easy", not "is"?

    • ghusbands 2 years ago

      Most people believe that databases are ACID by default, so it's worth bringing up when an article even subtly implies otherwise.

      • bhattid 2 years ago

        Damn, this is news to me. Seems I have some reading to do, thanks for pointing that out.

xwowsersx 2 years ago

I have to say this was written extremely well. Quite cogent and I feel I learned a little something. Bookmarking this as a pretty decent intro to this area that I can refer people to.

slotrans 2 years ago

Fixed schemas are good. Document stores are bad. SQL is good.

Stop doing this nonsense. It's a step backwards. As the intro points out, hierarchical and graph DBs came first, and relational was built in part to solve their problems. Document DBs just bring those problems back.

  • bob1029 2 years ago

    > Fixed schemas are good.

    I recall getting into an argument recently (perhaps on HN) wherein the central thesis for why SQL is bad is because the schema is "difficult" to change relative to a document store or other no-SQL abstraction.

    If you don't have a clear idea of what the representative SQL schema might be for your problem or business (say, within ~80%+ certainty), one may argue you should not be writing any software until you've further clarified things with business stakeholders.

    I strongly believe that virtually all evil which emerges from practical software engineering comes out of this "flexible schema" bullshit. If the business is certain of the shape of their problem, there is almost certainly a fixed schema that can accommodate. There are very few problem domains which cannot be coaxed into a strict SQL schema.

    • totalhack 2 years ago

      There are also ways to add some flexibility into a "fixed" schema when you need it. Entity-attribute-value tables, views, JSON columns (as a last resort), or a semantic layer like https://github.com/totalhack/zillion

      • bob1029 2 years ago

        > JSON columns (as a last resort)

        This is the design-time escape hatch for me. The remaining 20% space of unknowns can usually be dealt with here. As long as the most critical tables, relations & properties have been set in stone, I would find it reasonable to allow a "ExtendedPropertiesJson" column in limited areas of the schema.

        Over time, these JSON blobs should be refactored into the schema as features become more stable.

        The key is to have a solid relational model as the foundation. You cannot really do it the other way around.

    • Scarbutt 2 years ago

      Business requirements change over time, specially at the beginning, you may have 80% certainty of the schema today but not in four months.

  • xtracto 2 years ago

    I am tech advisor to a bunch of startups. One of them doing stock buy/sell came to me with their MongoDB based system. The first thing I told them is that using a document based db for oltp for their use case was going to give them problems.

    I saw it first hand 10 years ago, and had to do a migration.

    Their justification for using mongo was that their system is very dynamic so their data changes a lot and sql based DBs dont allow that. I told them about DBA migrations and whatnot, but I just haven't been able to convince them.

    It's sad seeing how they are digging into the same hole I had to digg out myself from a decade ago.

  • lcnPylGDnU4H9OF 2 years ago

    At this point, one chooses the solution for their problem. The reason the fads occur is that a person who vaguely understands both the problem and the solution will write a blog post which happens to go viral talking about how the solution will solve all problems.

    NoSQL databases aren’t unilaterally worse than relational ones. They just solve different problems.

    • pphysch 2 years ago

      > NoSQL databases aren’t unilaterally worse than relational ones. They just solve different problems.

      I can't prove this, but I assert that a relational database that has solid JSON+text support (e.g. Postgres) is on much better footing than a NoSQL DB that attempts to implement a true relational model.

      One is a adding a special new datatype, the other is trying to add an entire paradigm.

      Just use Postgres. If you do need to migrate to Mongo for some reason, dumping your tables into JSON isn't the end of the world.

      • lcnPylGDnU4H9OF 2 years ago

        > a relational database that has solid JSON+text support (e.g. Postgres) [vs] a NoSQL DB that attempts to implement a true relational model

        I think both cases are similar to using a flat-head screwdriver for a phillips screw. It will technically work but you might run into otherwise avoidable problems. Use the right tool for the job. Though generally a relational database with JSON support will better serve most web applications, a nosql db might better serve niche internal processing.

  • scott_meyer 2 years ago

    what is a document? How is an ORANUM or a bignum not a document?

    One motivation for creating documents is that modeling document contents as relations requires the creation of a bunch of primary keys which no natural definition. A simple document might be an ordered collection of paragraphs, [p23, p57, ...]

    Modifying such things is difficult. In fact, the most effective way of structuring modification seems to be OTs based on document offsets. What Google docs does.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection