Settings

Theme

How I format SQL code

bytepawn.com

60 points by szaboat 6 years ago · 43 comments

Reader

lancefisher 6 years ago

I appreciate write-ups like this, but I really disagree with what seems to be the majority that SQL keywords should be uppercase. It’s one of the last uppercase holdovers from the old days. HTML used to be uppercase as well. Lowercase is objectively more readable, easier to type, and editors colorize keywords so they stand out. Uppercase is really not necessary in the 2020s. Check out Matt Mazur’s styleguide (linked in the post) for an alternative that endorses lowercase. He also has a contrasting style on where Boolean operators should go. https://github.com/mattm/sql-style-guide/blob/master/README....

  • jolmg 6 years ago

    > editors colorize keywords so they stand out

    Not when it's embedded as a string in another language, like when the query you want is not supported by the ORM.

    > Lowercase is objectively more readable

    No, and definitely not objectively. I generally don't capitalize my SQL, but I can't argue that using lowercase exclusively makes the SQL more readable. It definitely does help readability to differentiate SQL keywords from table and column names. Compare:

      select
          region_fleet,
          case when status = 'delivered' then 'delivered' else 'not delivered' end as status,
          date_trunc('week', day) as week,
          count(distinct row(day, so_number)) as num_orders,
          count(distinct case when scheduled_accuracy_meters <= 500 then row(day, so_number) else null end) as num_accurate,
          avg(scheduled_accuracy_meters) as scheduled_accuracy_meters
      from
          deliveries
      where
          ...
      group by
          1, 2, 3
    
    with

      SELECT
          region_fleet,
          CASE WHEN status = 'Delivered' THEN 'Delivered' ELSE 'Not Delivered' END AS status,
          DATE_TRUNC('week', day) AS week,
          COUNT(DISTINCT ROW(day, so_number)) AS num_orders,
          COUNT(DISTINCT CASE WHEN scheduled_accuracy_meters <= 500 THEN ROW(day, so_number) ELSE NULL END) AS num_accurate,
          AVG(scheduled_accuracy_meters) AS scheduled_accuracy_meters
      FROM
          deliveries
      WHERE
          ...
      GROUP BY
          1, 2, 3
    
    It makes the column names stand out when you lack color hints. You can quickly skim to see what data is involved in a query without visually parsing the expressions.
    • oarabbus_ 6 years ago

      The first is actually much easier to read, and lower case is far superior.

      Lowercase letters are read and comprehended faster: https://ux.stackexchange.com/questions/72622/how-easy-to-rea...

      Additionally using casing when it has no meaning is an anti-pattern.

      • jolmg 6 years ago

        That link talks in general. In general, I agree that lowercase is more readable.

        > Additionally using casing when it has no meaning is an anti-pattern.

        Why do you say that it has no meaning? This is about differentiating SQL keywords from table and column identifiers. That's the meaning.

        > The first is actually much easier to read, and lower case is far superior.

        Reading the query whole, sure, but are you seriously suggesting that you can skim for the identifiers faster in the all-lowercase one when there are no color hints?

        • oarabbus_ 6 years ago

          >Why do you say that it has no meaning? This is about differentiating SQL keywords from table and column identifiers.

          One could also type like `CoUnT(dIsTiNcT CaSe WhEn ... EnD)` to make the keywords stand out. Casing has absolutely no meaning in SQL outside of single quotes.

          On the other hand, (in python for example) SOME_FUNCTION() and some_function() both have meaning. They do not refer to the same thing.

          >but are you seriously suggesting that you can skim for the identifiers faster in the all-lowercase one when there are no color hints?

          Yes. Especially since my eyes don't get stuck on the massive blobs of ALL CAPS YELLING TEXT in the select.

          • jolmg 6 years ago

            > One could also type like `CoUnT(dIsTiNcT CaSe WhEn ... EnD)` to make the keywords stand out. Casing has absolutely no meaning in SQL outside of single quotes.

            That the language doesn't enforce a meaning doesn't mean that we can't add meaning to the casing. This is like how in multiple languages it's convention to write constant variables in all-caps even though few languages enforce it.

            • oarabbus_ 6 years ago

              That's a fantastic example because an all-caps constant actually does have a different meaning than the same word spelled in lowercase, and all the more reason to not capitalize keywords in SQL (Structured Query Language, being an acronym, is proper to capitalize)

              • jolmg 6 years ago

                Could you fill out the following sentence, please?

                When an identifier is written in all-caps, it generally _____ that it's a constant.

                I feel like you're just, in bad faith, refusing to acknowledge that the word "meaning" doesn't have to be in any way related to language enforcement. Are you trying to argue for the sake of arguing?

    • arethuza 6 years ago

      For what it is worth, I find the first of those (i.e. the lowercase one) much easier to read.

      • amp108 6 years ago

        Yeah, but how about this:

        select region_fleet, case when status = 'delivered' then 'delivered' else 'not delivered' end as status, date_trunc('week', day) as week, count(distinct row(day, so_number)) as num_orders, count(distinct case when scheduled_accuracy_meters <= 500 then row(day, so_number) else null end) as num_accurate, avg(scheduled_accuracy_meters) as scheduled_accuracy_meters from deliveries where ... group by 1, 2, 3

        vs.:

        SELECT region_fleet, CASE WHEN status = 'Delivered' THEN 'Delivered' ELSE 'Not Delivered' END AS status, DATE_TRUNC('week', day) AS week, COUNT(DISTINCT ROW(day, so_number)) AS num_orders, COUNT(DISTINCT CASE WHEN scheduled_accuracy_meters <= 500 THEN ROW(day, so_number) ELSE NULL END) AS num_accurate, AVG(scheduled_accuracy_meters) AS scheduled_accuracy_meters FROM deliveries WHERE ... GROUP BY 1, 2, 3

        ... because a lot of time, when these nicely-formatted statements get parsed, the whitespace gets condensed, and when it gets spit out in an error message, I for one would like the all-caps keywords to be landmarks to direct my eye.

        • roelschroeven 6 years ago

          I still find the first one more readable.

          Maybe it's just me, or maybe it's a matter of habit, but for me the changes from lowercase to uppercase and back are a kind of hurdle that make my brain pause a moment. The second example is much more cumbersome to me to read and parse.

          • eukaryote 6 years ago

            Same for me - I find the use of upper case keywords like this jarring in SQL. It seems quite old-school, like reading COBOL!

            • arethuza 6 years ago

              It's probably my repressed memories of FORTRAN that cause me to react so strongly to UPPERCASE.

    • lancefisher 6 years ago

      Some editors do syntax highlight "language injections" or SQL embedded as a string. e.g. Rider by JetBrains: https://blog.jetbrains.com/dotnet/2018/10/29/sql-inside-c-st...

    • available 6 years ago

      What do you think about only capitalizing the first letter like this:

        Select
            region_fleet,
            Case When status = 'Delivered' Then 'Delivered' Else 'Not Delivered' End As status,
            Date_Trunc('week', day) As week,
            Count(Distinct Row(day, so_number)) As num_orders,
            Count(Distinct Case When scheduled_accuracy_meters <= 500 Then Row(day, so_number) Else Null End) As num_accurate,
            Avg(scheduled_accuracy_meters) As scheduled_accuracy_meters
        From
            deliveries
        Where
            ...
        Group By
            1, 2, 3
    • mmgutz 6 years ago

      The 2nd shouts at me from all angles, maybe a side effect from being hounded by netiquette police in forums.

  • lazzlazzlazz 6 years ago

    Having written many many thousands of SQL queries, and having begun thinking that lower-casing keywords was better, I can tell you this is completely false.

    You absolutely cannot rely on using your application of choice for editing queries: formatting them aggressively and allowing transmittable demarcations of query structure (like case) makes for the most portable, universally readable queries.

    Given how many different systems we use these days to develop a single application, upper-casing is more necessary than ever in the 2020s.

  • silveroriole 6 years ago

    “It's just as readable as uppercase SQL and you won't have to constantly be holding down a shift key.” If only there was some way not to hold shift, some kind of key that locks your case...!

  • oarabbus_ 6 years ago

    I lowercase everything in my SQL.

    Lowercase + indentation is the way to go. The only SQL that should be upper case is text in single quotes.

  • strbean 6 years ago

    Everything about SQL is a holdover from ancient programming languages. From crappy 'natural sounding' syntax constructs ("SELECT ... FROM ... WHERE ...") to outright inconsistencies ("INSERT INTO table (column1, column2) VALUES (1, 2)" vs "UPDATE table SET column1=1, column2=2").

    It's a damn shame that the NoSQL movement turned out to just be NoRelational. RDBMS's could use a non-terrible query language.

jolmg 6 years ago

> God is merciful because AND_ is 4 characters, a good tab width, so WHERE conditions are to be lined up like (same for JOIN conditions)

  WHERE
          country = 'UAE'
      AND day >= DATE('2019-07-01')
      AND DAY_OF_WEEK(day) != 5
      AND scheduled_accuracy_meters <= 10*1000
It looks better when you use a tab-width of 2:

  WHERE country = 'UAE'
    AND day >= DATE('2019-07-01')
    AND DAY_OF_WEEK(day) != 5
    AND scheduled_accuracy_meters <= 10*1000
dchess 6 years ago

I don't see the benefit of putting table names on a different line than the keyword.

How is this:

  FROM
    tablename t
  INNER JOIN
    other_table ot
  ON
    t.id = ot.id
More readable than:

  FROM tablename t
  INNER JOIN other_table ot
    ON t.id = ot.id
I agree with a lot of these recommendations, but this one irks me. Also I'd love if someone could create a nice code-formatter for SQL like Python's Black.
  • Macha 6 years ago

    In the join case, it makes your diffs nicer when joining multiple tables

        FROM foo
        INNER JOIN
            other_table using (other_table_id)
    
    to:

        FROM foo
        INNER JOIN
        +  foo_bars using (foo_id),
           other_table using (other_table_id)
monkeycantype 6 years ago

I also use his multi line format for boolean logic:

    select 'biscuit'
    where 
    (    
        (
            @alpha 
            < 
            pow(
                sin(   
                    radians(
                        @scheduled_lat - @actual_lat
                    ) 
                    / 2 
                )
                , 2
            )
        )
        and   
        @alpha > 0
    )
flatfilefan 6 years ago

Great style guide in my opinion. It is actually rather helpful to have those SQLs formatted neatly. As an analyst you have to write quite a few of them. So copy pasting and reusing is most helpful and boosts productivity. To make sure that you don’t make errors a clean layout for eyeballing is necessary. The same for bug fixing, should you have one planted still.

truculent 6 years ago

If it doesn’t come with an auto formatter it doesn’t matter. Making developers manually style their code is barbarism

whynotmaybe 6 years ago

Personaly, I put the comma before the column name :

  SELECT
     col1
     ,col2
     ,col3
It's easier for me to add a column or move it like this. Otherwise I have to search the comma when my query has only one column and I add one or when I add a column at the end
  • BossingAround 6 years ago

    I know this is a question of style, but wow that looks ugly.

    The point about ease of adding a new column is absolutely valid.

    The best answer to it, subjectively and IMHO, is on the language level, e.g. making it legal to end the statement with a comma:

    SELECT col1, col2, col3, from ...

    • whynotmaybe 6 years ago

      That would definitely be a game changer... but I'm not sure I might be ready for that !

merusame 6 years ago

I struggle to find a beautifier doing something similar to this with indentation. I use quite a bit of plpgsql which makes it even more challenging. I have tried a few found in the www however none of them cut it. Any recommendations?

arh68 6 years ago

This is my favorite guide yet!

My syntax, like others, is a little different (lowercase, 2 spaces, commas-first, bracket quotes, ons right under joins w/ joined table on LHS, left joins left-aligned): (this query isn't supposed to make sense)

    select
      u.id                   [user]
      , u.email              [email]
      , o.name               [office]
      , sum(t.id)            [# things]
    from main_tblusers_db u
    inner join tbloffices_db o
            on o.id = u.office_id
    inner join things_tbl t
            on t.user_id = u.id
    left  join example e
            on e.user_id = u.id
    where
      u.deleted is null
      and (
        u.active is not null
        or u.special = 1
      )
    group by
      u.id          -- the 1, 2 syntax is new to me!
      , u.email
      , o.name
  • lazzlazzlazz 6 years ago

    For the record, this syntax is horrific and almost unreadable to my eyes.

    Multiple spaces after `LEFT` in `LEFT JOIN`? Just to stick with "river"-style alignment, yet your outer-level keywords (`SELECT`, `FROM`, etc.) aren't aligned?

    It's difficult to understand why one would pick this format.

    • arh68 6 years ago

      Well how far do you go with the river? Aligning with select means group by sticks out. Aligning with group by means left join sticks out. Aligning with left join means inner join sticks out.

      EDIT: feel free to show me something better..

  • gwillz 6 years ago

    Well this might be an interesting discussion to read for you. Take what you will from it.

    https://gist.github.com/isaacs/357981

leblancfg 6 years ago

The rest of this man's blog is also worth a visit. Great work, Marton!

P.S. Can I suggest you put your name somewhere in your header?

P.P.S. I see you, too, use 'self' when taking notes. Would you also be a Pythonista? :)

wodenokoto 6 years ago

Can anyone explain the logic / benefit of the group by recommendation?

  • oarabbus_ 6 years ago

    It's useful to put the grouping columns so you can say `group by 1,2,3,4,5` instead of `group by 1,2,6,7,9`.

    Implementing production queries, you tend to write out the full column names, but for 95% of your SQL this is a boon to the analyst or data scientist.

ninju 6 years ago

It's a case of yet another standard (https://xkcd.com/927/)

The author recommends using upper-case for all keywords while Matt Mazur's SQL style guide, that is linked at the bottom of the article, recommends using lowercase for keywords :-)

vladsanchez 6 years ago

I've done it that way for the last 20 years, but I've never blogged/wrote about it. That's the difference.

  • boublepop 6 years ago

    There is absolutely no value for anyone in you sharing the fact that you don’t share your opinions on SQL style. Yet there is a lot of value in OP sharing his thoughts on style. That’s the difference.

Keyboard Shortcuts

j
Next item
k
Previous item
o / Enter
Open selected item
?
Show this help
Esc
Close modal / clear selection