What happens when you COPY in Postgres?

blog.benthem.io

40 points by dantetheinferno 2 years ago · 5 comments

Reader

terom 2 years ago

The way that the "Postgres", "libpq" terminology is used in the article ("libpq, the API backend for Postgres") leaves me more confused than I was before. I would understand "libpq" as the client-side library implementing the PostgreSQL wire protocol, and "Postgres" as the PostgreSQL server implementation, but I think in this article "Postgres" seems to refer to the psql CLI implementation?

> I’d like to dig more into how a C FILE reference is created on the server to utilize the above code. I plan to also do more more digging on the libpq side of this operation, including how the data is written to the WAL and processed

I don't think libpq has anything to do with WAL processing, and I would likely guess that the PostgreSQL server implementation actually uses mmap?

sitharus 2 years ago

The article is really unclear. COPY from a file is a fully server-side process, the server opens and loads the file directly.
However the code linked to is from psql, the CLI tool. This is for the \copy CLI command (which invokes COPY FROM STDIN on the server).
So while it’s correct that this code is talking to libpq, it really has nothing to do with how COPY is faster than a bulk INSERT. From the psql side there isn’t much difference between them, it just sends data to the server.
The actual code for copy is here https://github.com/postgres/postgres/blob/master/src/backend...
- dantetheinfernoOP 2 years ago
  
  Hey hey, post author here.
  This is a really good point that wasn't fully clear to me - this post definitely focuses on the \COPY from psql, rather than the CopyFrom backend.
  I've gone ahead updated the post in the introduction to highlight that this post is about `psql`. My goal here is to try and understand Postgres better - so thank you for pointing this out :). I think writing a future post going into the backend code you mentioned is necessary. My goal is to have solid reasoning on why `COPY` is faster, and I'll need to keep writing these posts to get there.
  Let me know if you think it's still unclear - I don't want to leave up any article that's potentially confusing.
  I've also cleaned up the language a bit around `libpq`, I mistakenly restated the description of `libpq` from the Postgres , which is stated as "libpq is the C application programmer's interface to PostgreSQL. libpq is a set of library functions that allow client programs to pass queries to the PostgreSQL backend server and to receive the results of these queries.", but you two are correct, my statement was confusing.

DiabloD3 2 years ago

Typo in one of the headers: libq instead of libpq

dantetheinfernoOP 2 years ago

Fixed! Thank you!

Settings

What happens when you COPY in Postgres?

Keyboard Shortcuts