Pretending to be PostgreSQL: Part two - responding to queries

*Masquerading as the elephant in the room*

Introduction

In Part 1 we implemented the server handshake in Python. With our subterfuge now established, we’ll go ahead and respond to requests.

Last time, we stopped short of responding:

$ psql -h 127.0.0.1 -p 5433
psql (16.4 (Ubuntu 16.4-0ubuntu0.24.04.2), server 15.1)
Type "help" for help.

postgres=> select 1;
postgres=>

Today, we’ll accomplish the following:

$ psql -h 127.0.0.1 -p 5433
psql (16.4 (Ubuntu 16.4-0ubuntu0.24.04.2), server 15.1)
Type "help" for help.

postgres=> select 1;
 value 
-------
     1
(1 row)

postgres=>

Clearly indistinguishable from the real thing! Until you try to run any other query 😉

In brief: we’ll look at what’s needed to receive the client query (after the handshake is established), implement the three response messages needed to provide a row back to the client, test it, and then have a look at the packets produced on the wire. Refer back to Part 1 for the details on the protocol handshake and the references to the appropriate sections of the PostgreSQL documentation describing the messages used here.

Let’s jump right in.

Implementation

There’s a multitude of queries a client can send to us, but in this case we’ll look at a SELECT and send the correct set of bytes back on the wire to have our response rendered by psql, as if we were a standard PostgreSQL installation.

In this case we’re having a look at the Simple Query Protocol (as opposed to the Extended version).

SELECT queries

For our SELECT queries we need to read the query (message type Q) from the client and then respond with (generally) three of our own messages:

A RowDescription message that informs the client of the format (e.g. column names, data types, etc.) of the response rows. This is followed by one or more DataRow messages that each contain the length, columns and byte values of a row. Finally we will send a CommandComplete message to indicate that we’ve reached the end of the query results. Of course, after that we’ll also send our standard ReadyForQuery message to let the client know they can send new requests.

Receiving the query

In the last post, we’ve already set up the following case to catch the query message from the client when we receive message type Q:

values[0] = 'QUERY'
values[1] = values[1][1:] # Remove ampersand prefix

And then we’ll process that QUERY message using a new generate_reply() function:

reply = split_reply(data)
if reply[0] == 'QUERY':
    conn.send(generate_reply(reply[1]))

CommandComplete, for starters

Let’s start in reverse, with the simplest message: CommandComplete. This message can actually be implemented (at least against the psql binary) and tested without the other two messages.

You’ve seen the contents of the CommandComplete messages before: it’s the standard messages that you’ll receive from the server like COPY 100, INSERT 1 or CREATE TABLE when you run a query or restore a dump file. For example:

postgres=# create table test();
CREATE TABLE

In our SELECT case it’s also handy to do this now since our message will be displayed if we don’t return any RowDescription messages (when we do, the output of CommandComplete won’t be printed in psql anymore, at least not for SELECTs). Now let’s have a look at generate_reply():

def generate_reply(query):
    command = query.split(' ')[0].lower()
    print(command.upper())
    if command == 'SELECT':
        print(f'select query: {query}')

    if len(query) > 0:
        response = create_message_pack('C', f'{command.upper()} 1\0')
    else:
        response = create_message_pack('I', [])
    return response

There’s a couple of print statements in there just for the sake of seeing the message flow on our side. As long as the query message from the client is not empty, we can use our already-existing create_message_pack to reply with C for CommandComplete, with it’s message length and the command in uppercase. The command, e.g. SELECT, is also followed by the number of rows affected, if applicable. Here we’re just always responding that one row was selected, followed by the null terminator (the latter could be handled a bit more gracefully).

You’ll note that we’re also accounting for empty queries from the client by responding with the appropriate EmptyQueryResponse (message type I) in that case.

Now that we’ve implemented it, we can test this behaviour:

$ psql -h 127.0.0.1 -p 5433
psql (16.4 (Ubuntu 16.4-0ubuntu0.24.04.2), server 15.1)
Type "help" for help.

ian=> select 1;
SELECT 1
ian=>

So psql then simply respects our response, even though it’s clearly a bit bogus. You’d be expecting to receive a row back containing an integer 1.

RowDescription

With that done, let’s work our way back to the actual message sequence, which should start with a RowDescription (message type T) telling the client what to expect when the DataRow messages start arriving.

The format of the RowDescription message payload looks as follows, with some values we’ll be using in our example (note that this table uses the typname integer names for data types, e.g. int2 = 16-bit integer):

Fragment	Data type	Value	Notes
Number of fields	`int2`	1	For our purposes we’ll return one field. The fragments following this one are per field.
Field name	`string`	“value\0”	A null terminated string indicating the name of the field.
OID of the table the field belongs to	`int4`	0	Zero denotes that this field does not belong to a table.
Attribute number of the column this field belongs to	`int2`	0	As above.
OID of the data type for this field	`int4`	23	The explanation for this fragment follows after this table.
Size of this data type	`int2`	4	The size of the data type in bytes.
Type modifier for this data type	`int4`, signed	-1	-1 denotes no type modifier. Can be used to denote e.g. the maximum length of a `varchar` column.
Format code for this field	`int2`	0	0 indicates text, 1 indicates binary data

Some explanation is needed for the data type OID (object identifier). PostgreSQL keeps a reference list of data types with their OIDs, names, and length in pg_type, which you can query directly:

postgres=# select oid, typname, typlen from pg_type limit 5;
 oid | typname | typlen 
-----+---------+--------
  16 | bool    |      1
  17 | bytea   |     -1
  18 | char    |      1
  19 | name    |     64
  20 | int8    |      8
(5 rows)

Using this, and knowing that we are aiming to return an int4 in our case, we can find the necessary information easily:

postgres=# select oid, typname, typlen from pg_type
postgres-# where typname = 'int4';
 oid | typname | typlen 
-----+---------+--------
  23 | int4    |      4
(1 row)

Note that typlen specifies the length in bytes. This means int4 refers to a four-byte integer, i.e. 32-bits. Here’s a handy reference table for common data types:

oid	typname	typlen
16	bool	1
18	char	1
700	float4	4
701	float8	8
21	int2	2
23	int4	4
20	int8	8
1700	numeric	-1
25	text	-1

With this information, we can now create our payload in Python:

payload = \
    (1).to_bytes(2) + \
    "value".encode('ascii') + 0x0.to_bytes() + \
    (0).to_bytes(4) + \
    (0).to_bytes(2) + \
    (23).to_bytes(4) + \
    (4).to_bytes(2) + \
    (-1).to_bytes(4, signed=True) + \
    (0).to_bytes(2)

RowDescription = \
    'T'.encode('ascii') + \
    (len(payload) + 4).to_bytes(4) + \
    payload

The use of the line continuation character \ prevents me from adding comments to the end of each line above, but it corresponds exactly to the RowDescription table above, so it should hopefully be clear enough.

When we create the message, we prefix the payload with the message type as usual, in this case T, and the length of the payload. Note, once again, that the length of the payload also includes the size of the length parameter, which is an int4. This is why we have len(payload) + 4 in the code above.

Before we add the DataRow(s), we can test our response in psql again. We’ll use the following snippet, which is a combination of our RowDescription, and the CommandComplete message we created earlier (we simply concatenate the bytes together):

def generate_reply(query):
    command = query.split(' ')[0].lower()
    print(command.upper())

    if command == 'SELECT':
        print(f'select query: {query}')

    if len(query) > 0:
        # We need a RowDescription, some DataRow(s) and CommandComplete here

        # RowDescription
        payload = \
            (1).to_bytes(2) + \
            "value".encode('ascii') + 0x0.to_bytes() + \
            (0).to_bytes(4) + \
            (0).to_bytes(2) + \
            (23).to_bytes(4) + \
            (4).to_bytes(2) + \
            (-1).to_bytes(4, signed=True) + \
            (0).to_bytes(2)
        
        RowDescription = \
            'T'.encode('ascii') + \
            (len(payload) + 4).to_bytes(4) + \
            payload
        
        print(f'RowDescription {RowDescription}')
        
        # DataRow
        pass
        
        # CommandComplete 
        CommandComplete = create_message_pack('C', f'{command.upper()} 1\0')
        print(f'CommandComplete {CommandComplete}')
        response = RowDescription + CommandComplete
    else:
        EmptyQueryResponse = create_message_pack('I', [])
        response = EmptyQueryResponse
    
    print(response)
    return response

With this put together, let’s run the code again:

psql (16.4 (Ubuntu 16.4-0ubuntu0.24.04.2), server 15.1)
Type "help" for help.

ian=> select 1;
 value 
-------
(0 rows)

First of all, as predicted, our CommandComplete message of SELECT 1 has disappeared from the output, and has been replaced with an empty table response. You’ll note that our value column name is displayed, and that we’ve returned zero rows. A perfectly valid response, and great progress!

DataRow(s)

Now that we’ve managed to let psql know what to expect when the query response is returned, and we have a response termination in place, it’s time to sandwich some data in between. To do that, we’ll return one or more DataRow messages of type D. Each DataRow message has the following composition:

Fragment	Data type	Value	Notes
Number of column values	`int2`	1	For our purposes we’ll return one value. The fragments following this one are per value.
Length of the value	`int4`	1	The length of the value in bytes.
Value	`Byte(s)`	`0x31`	The byte array of the value as text. This array should be equal to the length denoted above. We’ll return a single byte with the value of the ASCII character `1`.

Perhaps handy to know is that, for NULL values, the length above is set to -1, and no bytes are sent for the value fragment in that case.

This results in the following Python code:

value = '1'.encode(encoding='ascii')
payload = \ 
    (1).to_bytes(2) + \ 
    len(value).to_bytes(4, signed=True) + \ 
    value

DataRow = \ 
    'D'.encode(encoding='ascii') + \ 
    (len(payload) + 4).to_bytes(4) + \ 
    payload

You’ll note that our value in this case is hard coded to 1, but it’s easy enough to change that to the actual value requested by the client from our query.split(' ') array by grabbing it from the second index. In this case I’m simplifying it to have ASCII 1 (0x31) carry through from the table above right through to our packet capture later on.

With this added, we can now put everything together:

response = RowDescription + DataRow + CommandComplete

And return that to the client:

psql (16.4 (Ubuntu 16.4-0ubuntu0.24.04.2), server 15.1)
Type "help" for help.

ian=> select 1;
 value 
-------
     1
(1 row)

Mission accomplished: we have now successfully responded to a SQL query from psql! Here’s a celebratory baby elephant:

What’s nifty is that we can see that our values are correctly sent to psql as well, by sniffing our own network traffic (which is a nifty side effect of our lazy SSL refusal earlier) and comparing it with the real PostgreSQL server:

sudo tshark -i any  -f 'tcp port 5433' -d tcp.port==5433,pgsql -O pgsql

If we reconnect and run our query again, we’ll see each message in turn:

1) The SSL request from the client:

Frame 6: 76 bytes on wire (608 bits), 76 bytes captured (608 bits) on interface any, id 0                
Linux cooked capture v1                                                                                  
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1                                              
Transmission Control Protocol, Src Port: 48864, Dst Port: 5433, Seq: 1, Ack: 1, Len: 8                   
PostgreSQL                                                                                               
    Type: SSL request                                                                                    
    Length: 8                                                                                            
    Request code: SSLRequest (80877103)

2) Our refusal:

Frame 8: 69 bytes on wire (552 bits), 69 bytes captured (552 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 5433, Dst Port: 48864, Seq: 1, Ack: 9, Len: 1
PostgreSQL
    Type: SSL response
    SSL Response: Unwilling to perform SSL ('N')

3) The startup message:

Frame 10: 142 bytes on wire (1136 bits), 142 bytes captured (1136 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 48864, Dst Port: 5433, Seq: 9, Ack: 2, Len: 74
PostgreSQL
    Type: Startup message
    Length: 74
    Protocol major version: 3
    Protocol minor version: 0
    Parameter name: user
    Parameter value: ian
    Parameter name: database
    Parameter value: ian
    Parameter name: application_name
    Parameter value: psql
    Parameter name: client_encoding
    Parameter value: UTF8

4) The password request from us:

Frame 11: 77 bytes on wire (616 bits), 77 bytes captured (616 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 5433, Dst Port: 48864, Seq: 2, Ack: 83, Len: 9
PostgreSQL
    Type: Authentication request
    Length: 8
    Authentication type: Plaintext password (3)

5) The password response from the client:

Frame 12: 75 bytes on wire (600 bits), 75 bytes captured (600 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 48864, Dst Port: 5433, Seq: 83, Ack: 11, Len: 7
PostgreSQL
    Type: Password message 
    Length: 6
    Password: 1

6) Our response indicating that authentication succeeded:

Frame 13: 77 bytes on wire (616 bits), 77 bytes captured (616 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 5433, Dst Port: 48864, Seq: 11, Ack: 90, Len: 9
PostgreSQL
    Type: Authentication request
    Length: 8
    Authentication type: Success (0)

7) Our backend key data, parameter status messages, and finally our indication that we are ready to receive queries:

Frame 15: 139 bytes on wire (1112 bits), 139 bytes captured (1112 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 5433, Dst Port: 48864, Seq: 20, Ack: 90, Len: 71
PostgreSQL
    Type: Backend key data 
    Length: 12
    PID: 1
    Key: 1
PostgreSQL
    Type: Parameter status 
    Length: 24
    Parameter name: server_version
    Parameter value: 15.1
PostgreSQL
    Type: Parameter status 
    Length: 26
    Parameter name: server_encoding
    Parameter value: utf-8 
PostgreSQL
    Type: Ready for query
    Length: 5
    Status: Idle (73)

8) Here’s the select 1 query from the client:

Frame 17: 83 bytes on wire (664 bits), 83 bytes captured (664 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 48864, Dst Port: 5433, Seq: 90, Ack: 91, Len: 15
PostgreSQL
    Type: Simple query
    Length: 14
    Query: select 1;

9) And our painstakingly crafted response:

Frame 18: 125 bytes on wire (1000 bits), 125 bytes captured (1000 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 5433, Dst Port: 48864, Seq: 91, Ack: 105, Len: 57
PostgreSQL
    Type: Row description
    Length: 30
    Field count: 1
        Column name: value 
            Table OID: 0
            Column index: 0
            Type OID: 23
            Column length: 4
            Type modifier: -1
            Format: Text (0)
PostgreSQL
    Type: Data row
    Length: 11
    Field count: 1
        Column length: 1
        Data: 31
PostgreSQL
    Type: Command completion
    Length: 13
    Tag: SELECT 1

Once again our ASCII 1 shows up in the Data field as hex 31.

10) And at the very end again, our ReadyForQuery message to the client:

Frame 20: 74 bytes on wire (592 bits), 74 bytes captured (592 bits) on interface any, id 0
Linux cooked capture v1
Internet Protocol Version 4, Src: 127.0.0.1, Dst: 127.0.0.1
Transmission Control Protocol, Src Port: 5433, Dst Port: 48864, Seq: 148, Ack: 105, Len: 6
PostgreSQL
    Type: Ready for query
    Length: 5
    Status: Idle (73)

You can try this on your own PostgreSQL server, and compare the details to find the same responses.

Congratulations on getting this far!

A couple of short notes

Multiple rows

If you wanted to return more rows, you’d just add them to the response, or send them one by one before sending CommandComplete:

response = RowDescription + DataRow + DataRow + CommandComplete

It does not matter whether they are sent one by one, or all in one go. psql will wait for the ReadyForQuery message at the end.

Comparing notes with a real server

If you wanted to compare the packet captures between your own server and the real PostgreSQL, just make sure to connect to the server without SSL and double-check the port number in your tshark command line:

sudo tshark -i any  -f 'tcp port 5432' -d tcp.port==5432,pgsql -O pgsql
...
psql -h 127.0.0.1 -p 5432 "sslmode=disable"

This is also quite nifty for implementing more of the protocol, with the PostgreSQL docs alongside.

Other queries

Operations such as INSERT, UPDATE, DROP, ALTER, et al can be “implemented” to some extent. We can, for example, respond with confirmations for INSERTs without actually processing or storing any data.

There is a case to be made that in some instances one might want to keep a per-connection state (even if implemented as a series of decorators to lead up to a meta, in-memory schema), but that is largely a fun technical challenge as opposed to something that has a practical use. If at some point one would want to have a kind of PostgreSQL playground system that could be handy, perhaps.

A bunch of other “queries” like COPY and REPLICATION streams are also around to play with, and those have their own niche, interesting use cases too.

Future work

If you wanted to implement something more suitable for the real world, there’s a couple of things to you might want to consider implementing, for starters, and in no particular order, according to your own needs:

1) A real authentication system

2) SSL support

3) The message types you need

4) The extended query protocol

5) UTF-8 support

6) A SQL parser

With that combined, you can get quite far. And if you’re implementing a pooler/proxy, you can hand off quite a lot of the work to a real backend PostgreSQL server. Of course, you can also find some existing implementation of the PostgreSQL wire protocol (there’s plenty out there), but that’s not nearly as much fun.

The code

I’m going to clean it up a bit and then host it somewhere on my Github. I’ll add the link here when it’s ready to go. If you’d like a heads-up when that happens, you’re welcome to follow me on Bluesky or connect with me on LinkedIn.

Final thoughts

We can do quite a lot with what we’ve learned here already. I might have a look at what goes into writing a simple PostgreSQL proxy and write that up sometime, but let me know in the comments if there’s anything in particular you’d be interested in seeing in Part 3. The audience for this kind of thing is quite small, and there’s almost zero writeups on these matters.

Thanks for reading!

P.S. Errata are welcome, please let me know if I misunderstood or typo’d something.