GitHub - ValkDB/postgresparser: ANTLR-based PostgreSQL query parser for Go. Extracts tables, columns, joins, CTEs, parameters, DDL actions, and full column-usage metadata from SQL into a structured IR.

5 min read Original article ↗

CI Go Reference License

A pure-Go PostgreSQL parser. No cgo, no C toolchain — just go build.

Why postgresparser?

Need to parse PostgreSQL SQL in Go but can't use cgo? Deploying to Alpine containers, Lambda, ARM, scratch images, or anywhere that requires CGO_ENABLED=0?

postgresparser works everywhere go build works. It parses SQL into a structured intermediate representation (IR) that gives you tables, columns, joins, filters, CTEs, subqueries, and more — without executing anything.

result, err := postgresparser.ParseSQL(`
    SELECT u.name, COUNT(o.id) AS order_count
    FROM users u
    LEFT JOIN orders o ON o.user_id = u.id
    WHERE u.active = true
    GROUP BY u.name
    ORDER BY order_count DESC
`)

fmt.Println(result.Command)       // "SELECT"
fmt.Println(result.Tables)        // users, orders with aliases
fmt.Println(result.Columns)       // u.name, COUNT(o.id) AS order_count
fmt.Println(result.Where)         // ["u.active=true"]
fmt.Println(result.JoinConditions) // ["o.user_id=u.id"]
fmt.Println(result.GroupBy)       // ["u.name"]
fmt.Println(result.ColumnUsage)   // each column with its role: filter, join, projection, group, order

For multi-statement behavior and strict mode, see Statement Count Handling.

batch, err := postgresparser.ParseSQLAll(`
CREATE TABLE public.api_key (
    id integer NOT NULL
);
CREATE TABLE public.sometable (
    id integer NOT NULL
);`)
if err != nil {
    log.Fatal(err)
}

fmt.Println(len(batch.Statements))                              // 2
fmt.Println(batch.Statements[0].Query.DDLActions[0].ObjectName) // "api_key"
fmt.Println(batch.Statements[1].Query.DDLActions[0].ObjectName) // "sometable"

Performance: With SLL prediction mode, most queries parse in 70–350 µs.

Installation

go get github.com/valkdb/postgresparser

What you can build with it

  • Query linting — detect missing WHERE on DELETEs, flag SELECT *, enforce naming conventions
  • Dependency extraction — map which tables and columns a query touches, build lineage graphs
  • Migration tooling — parse DDL to understand schema changes, diff CREATE statements
  • Audit logging — tag log entries with structured metadata (tables, operation type, filtered columns)
  • Query rewriting — inject tenant filters, add audit columns, transform SQL before execution
  • Index advisors — analyze column usage patterns to suggest optimal indexes

Parsing

Handles the SQL you actually write in production:

  • DML: SELECT, INSERT, UPDATE, DELETE, MERGE
  • DDL: CREATE TABLE (columns/type/nullability/default + PK/FK/UNIQUE/CHECK constraints), CREATE INDEX, DROP TABLE/INDEX, ALTER TABLE, TRUNCATE, COMMENT ON
  • CTEs: WITH ... AS including RECURSIVE, materialization hints
  • JOINs: INNER, LEFT, RIGHT, FULL, CROSS, NATURAL, LATERAL
  • Subqueries: in SELECT, FROM, WHERE, and HAVING
  • Set operations: UNION, INTERSECT, EXCEPT (ALL/DISTINCT)
  • Upsert: INSERT ... ON CONFLICT DO UPDATE/DO NOTHING
  • JSONB: ->, ->>, @>, ?, ?|, ?&
  • Window functions: OVER, PARTITION BY
  • Type casts: ::type
  • Parameters: $1, $2, ...

IR field reference: ParsedQuery IR Reference Comment extraction guide: Comment Extraction Guide

Statement Count Handling

Use the API variant that matches your input contract:

  • ParseSQL(sql) parses the first statement only (backward-compatible behavior).
  • ParseSQLAll(sql) parses all statements and returns ParseBatchResult with one Statements[i] result per input statement (Index, RawSQL, Query, Warnings).
    • A statement failed conversion when Statements[i].Query == nil.
    • Correlation is deterministic: Statements[i].Index maps to source statement order.
    • HasFailures is true when any statement has a nil Query or any Warnings.
  • ParseSQLStrict(sql) requires exactly one statement and returns ErrMultipleStatements when input contains more than one.
  • ParseSQLWithOptions(sql, opts), ParseSQLAllWithOptions(sql, opts), and ParseSQLStrictWithOptions(sql, opts) expose optional extraction flags.
    • IncludeCreateTableFieldComments enables inline -- field-comment extraction in CREATE TABLE.
    • COMMENT ON extraction is always enabled.

Supported SQL Statements

See docs/supported-statements.md for full details on parsed commands, graceful handling (e.g. SET/SHOW/RESET), and what's currently UNKNOWN or unsupported.

Category Statements Status
DML SELECT, INSERT, UPDATE, DELETE, MERGE Full IR extraction
DDL CREATE TABLE, ALTER TABLE, DROP TABLE/INDEX, CREATE INDEX, TRUNCATE, COMMENT ON Full IR extraction
Utility SET, SHOW, RESET Graceful — returns UNKNOWN, no error
Other GRANT, REVOKE, CREATE VIEW/FUNCTION/TRIGGER, COPY, EXPLAIN, VACUUM, BEGIN/COMMIT/ROLLBACK, etc. Not yet supported — may error or return UNKNOWN

Analysis

The analysis subpackage provides higher-level intelligence on top of the IR:

  • analysis.AnalyzeSQL(sql) analyzes the first statement only (matches ParseSQL).
  • analysis.AnalyzeSQLAll(sql) analyzes all statements and returns SQLAnalysisBatchResult.
  • analysis.AnalyzeSQLStrict(sql) requires exactly one statement (matches ParseSQLStrict).

Column usage analysis

Know exactly how every column is used — filtering, joining, projection, grouping, ordering:

result, err := analysis.AnalyzeSQL("SELECT o.id, c.name FROM orders o JOIN customers c ON o.customer_id = c.id WHERE o.status = 'active'")

for _, cu := range result.ColumnUsage {
    fmt.Printf("%s.%s → %s\n", cu.TableAlias, cu.Column, cu.UsageType)
}
// o.id → projection
// c.name → projection
// o.customer_id → join
// c.id → join
// o.status → filter

WHERE condition extraction

Pull structured conditions with operators and values:

conditions, _ := analysis.ExtractWhereConditions("SELECT * FROM orders WHERE status = 'active' AND total > 100")

for _, c := range conditions {
    fmt.Printf("%s %s %v\n", c.Column, c.Operator, c.Value)
}
// status = active
// total > 100

Schema-aware JOIN relationship detection

Pass in your schema metadata and get back foreign key relationships — no heuristic guessing:

schema := map[string][]analysis.ColumnSchema{
    "customers": {
        {Name: "id", PGType: "bigint", IsPrimaryKey: true},
        {Name: "name", PGType: "text"},
    },
    "orders": {
        {Name: "id", PGType: "bigint", IsPrimaryKey: true},
        {Name: "customer_id", PGType: "bigint"},
    },
}

joins, _ := analysis.ExtractJoinRelationshipsWithSchema(
    "SELECT * FROM orders o JOIN customers c ON o.customer_id = c.id",
    schema,
)
// orders.customer_id → customers.id

DDL extraction

For CREATE TABLE parsing, see examples/ddl/.

Performance

With SLL prediction mode, postgresparser parses most queries in 70–350 µs with minimal allocations. The IR extraction layer accounts for only ~3% of CPU — the rest is ANTLR's grammar engine, which SLL mode keeps fast.

See the Performance Guide for benchmarks, profiling results, and optimization details.

Examples

See the examples/ directory:

  • basic/ — Parse SQL and inspect the IR
  • analysis/ — Column usage, WHERE conditions, JOIN relationships
  • ddl/ — Parse CREATE TABLE / ALTER TABLE plus DELETE command metadata
  • multi_statement/ — Correlate ParseSQLAll output back to each input statement and detect failures (Query == nil)
  • sll_mode/ — SLL prediction mode for maximum throughput

Grammar

Built on ANTLR4 grammar files in grammar/. To regenerate after modifying:

antlr4 -Dlanguage=Go -visitor -listener -package gen -o gen grammar/PostgreSQLLexer.g4 grammar/PostgreSQLParser.g4

Compatibility

This is an ANTLR4-based grammar, not PostgreSQL's internal server parser. Some edge-case syntax may differ across PostgreSQL versions. If you find a query that parses in PostgreSQL but fails here, please open an issue with a minimal repro.

ParseSQL processes the first SQL statement for backward compatibility. For multi-statement input, use ParseSQLAll; to enforce exactly one statement, use ParseSQLStrict.

License

Apache License 2.0 — see LICENSE for details.