The single Postgres setting that almost took down our database
medium.com> Using SQL comments for tracing served as a “cache buster” — apparently the generic plan cache is keyed on the raw text of the query!
Keying user-provided data without canonicalizing it first, a classic oopsie that's bitten plenty in the arse.
That's all well and good, but it's not an error here. There's really no such thing as a canonical form of SQL. Keying the query cache and statistics on the raw query bytes is normal on every implementation I've ever had to get that deep into.
Furthermore in many databases, comments aren't always "just comments." For instance MySQL can specify nonstandard or version-specific syntax inside of comments and it will be executed. The 'STRAIGHT JOIN' below is an example from their docs and here is not a comment, but a mysql-specific part of the statement:
SELECT /*! STRAIGHT_JOIN */ col1 FROM table1,table2 WHERE ...
I dare say the worst offense here is enabling SQLAlchemy's Instrumentation in production without understanding how it might impact database performance. Even though some of these behaviors ended up causing unexpected problems, overall the instrumentation was clearly preventing some common queries from being potentially optimized by the query planner, and it may have been interfering with the buffer cache, query statistics, or other optimization mechanics as well.
> There's really no such thing as a canonical form of SQL. There's really no such thing as a canonical form of SQL. Keying the query cache and statistics on the raw query bytes is normal on every implementation I've ever had to get that deep into.
There doesn't have to be. Postgres internally converts the SQL query into a parse tree for further processing, and while the exact structure of that parse tree is an internal implementation detail, so is the generic plan cache itself. A more reliable cache design would be to use a serialized form of the parse tree as a cache key, as the parsing stage would discard unnecessary information, such as comments and redundant whitespace.
> Furthermore in many databases, comments aren't always "just comments." For instance MySQL can specify nonstandard or version-specific syntax inside of comments and it will be executed. The 'STRAIGHT JOIN' below is an example from their docs and here is not a comment, but a mysql-specific part of the statement:
> SELECT /! STRAIGHT_JOIN / col1 FROM table1,table2 WHERE ...
This is unfortunately true, and was a very boneheaded design decision, but it's not a showstopper. The query parser still has to distinguish between "harmless" comments and query hints at some point in the execution of the query. All that needs to be done to account for this is to move that distinction to the parsing phase so that the AST will have separate nodes for such hints that won't be ignored/discarded when serializing for use as a cache key.
> I dare say the worst offense here is enabling SQLAlchemy's Instrumentation in production without understanding how it might impact database performance. Even though some of these behaviors ended up causing unexpected problems, overall the instrumentation was clearly preventing some common queries from being potentially optimized by the query planner, and it may have been interfering with the buffer cache, query statistics, or other optimization mechanics as well.
I would say the worst offense is an RDBMS deciding that comments are an appropriate place to interpret query planning directives.
Being able to trace postgres queries is nice. Is there a way to do that while not breaking the plan caching?
The link in the article is to Notion. https://www.notion.so/324f561a7fd1484a952e0a59ca482013 specifically. Did the author forget to replace their link when they posted it to Medium?
D'oh! I drafted this up in Notion; just fixed, thank you for letting me know!