Ask HN: Why aren't SQL queries compiled?
If the database knew what queries were running ahead of time and have them compiled you could easily defend against SQL injection because the SQL isn't being interpreted as it's run, and it would be possible to run analysis over the query to see if there are any indexes or implementations missing that could speed it up. Something like a stored procedure's variable substitution but not interpreted when called. The most difficult aspect of this is that SQL is typically called from some other language (Javascript, C++, whatever), which gets data from somewhere (a web form, a previous database query, a file) and needs to plug it into a new SQL query, which generally needs to be sent to an SQL interpreter ("SELECT $DATA1 FROM $DATA2" etc), not to an inflexible pre-made compiled SQL command. And that obviously is susceptible to injection attacks. Really the only general way around this would be if all of those other languages had an API to communicate with a relational backend which did not interpret SQL at all, but rather only offered things like select_statement(fieldname, tablename, ...) Such interfaces exist, but I'm not really familiar with them, nor why they're not universally used -- except to note that it's a lot of work to create an SQL library. Look at the SQL language; it's a lot to support, particularly if you need to avoid 100 arguments passed to a function, etc. (Just saying one shouldn't leap to trivializing such projects.) Queries are compiled, i.e., a query plan is created and then executed. Depending on various things that plan may get re-used the next time the query is run. So what you are talking about exists. Databases can examine the plans, database statistics, and the resulting metrics from executing the plan and make recommendations about indexes. A SQL engine can use any method it wants to execute a query. For instance if you wanted to make a SQL engine that lets you write a SELECT on tables that are csv files that writes a C program and then compiles it with a C compiler that's up to you. In particular a SQL engine can compile stored procedures ahead of time and back in the day I worked at a few places that put "business logic" into SQL procedures, that is, 100% of the SQL access is done through stored procedures, the same way that many web applications define a front end in Java or Node that exposes data to a Javascript front end. If you build an app that way you can put a SQL specialist in charge of the SQL interface and also know the application is immune to SQL injection attacks. Prepared statements are parameterized (avoiding SQL injection) and can pre-compiled (what this really means depends on the implementation.) This. If you aren't using prepared statements, you're doing it wrong. I once worked on a system where the original developer, supposedly with 25+ years experience, did not know about prepared statements. He had developed his own home grown "ORM" that attempted to manually escape SQL. You can guess how well that worked out. We'd get SQL 'syntax errors' in production regularly. It’s a declarative language and it’s compiled into an execution "plan", and parameterised queries (bind variables, prepared statements, whatever you want to call them) are passed at runtime.
When you dynamically build queries by concatenation you bypass this compilation phase (parse, compute plan, etc.), and spend unnecessary time on near identical queries. If you want many more gory details, this is a good watch: https://youtu.be/eurwtUhY5fk Modern relational database systems do all of this already, have for decades. Well, not really. Not in the way OP is describing. Of course queries are literally compiled at run time. But by then it's too late as user-supplied data is already in the payload that is being compiled. Even stored procs don't protect you as their invocation itself is a runtime sql which can fall victim to injection. To achieve what OP wants would require a "build time" compilation. And a permission system where a SQL-account is restricted to only run/execute these pre-built queries and commands. No ad-hoc sql allowed at runtime. Stored procs only get you half-way there, ignoring their own invocation is a runtime sql vulnerable to injection. As of today, good coding practice with parametrized queries and procedure invocation is your only protection. But your app may not "own" the DB. Just a cog in a larger system. What OP is suggesting would move the shield closer to the data for better protection. So the 20+ other apps out of your control don't mess things up. Isn’t that what prepared statements basically are?