Work_mem: It's a Trap

48 points by enz 3 months ago · 8 comments

Reader

This, for me, is a current design flaw of Postgres. You expect your database to trade off memory vs spilling to disk within the resources you give it and the load it's under. Databases are much like operating systems; filesystems, scheduling, resource management, all the same things an operating system does, a database server needs to implement.

Work_mem is a symptom of punting. It gives the DBA an imprecise tool, and then implicitly offloads responsibility to not allocate too many steps in their query plans using global knowledge of physical execution steps which are dependent on statistics, available concurrency and so on.

The database ought to be monitoring global resource use for the query and partitioning it into stages that free up memory or spill to disk as necessary.

This all fundamentally goes back to the Volcano iterator pull design. Switching to pulling batches instead of row should already improve performance; and it would leave open the option of using a supervisor to schedule execution (query stages using fibers or similar continuation compatible control flow instead of recursive calls), with potential restarts / dynamic replanning when things like global memory limits are approached. Using batches leaves more performance margin for heavier calling mechanisms, and also opens the door for more vectorized strategies for operators.

jval43 3 months ago

I agree. I've known how it works for years, and I think the current setting is a cop-out.
In TFA it's set to a measly 2MiB, yet tried to allocate 2TiB. Note that the PG default is double that, at 4MiB.
What the setting does is offload the responsibility of a "working" implementation onto you (or the DBA). If it were just using the 4MiB default as a hardcoded value, one could argue it's a bug and bikeshed forever on what a "good" value is. As there is no safe or good value, the approach would need to be reevaluated.
The core issue is that there is no overall memory management strategy in Postgres, just the implementation.
Which is fine for an initial version, just add a few settings for all the constants in the code and boom you have some knobs to turn. Unfortunately you can't set them correctly, it might still try to use an unbounded amount of memory.
While the documentation is very transparent about this, just from reading it you know they know it's a bad design or at least an unsolved design issue. It just describes the implementation accurately, yet offers nothing further in terms of actual useful guidance on what the value should be.
This is not a criticism of the docs btw, I love the technically accurate docs in Postgres. But it's not the only setting in Postgres which is basically just an exposed internal knob. Which I totally get as a software engineer.
However from a product point of view, internal knobs are rarely all that useful. At this point of maturity, Postgres should probably aim to do a bit better on this front.

nh2 3 months ago

> You can’t cap memory, but ...

Why not? That'd be useful. Feels like a software written in C should make that reasonably easy.

somat 3 months ago

ulimit?
or more probably the c api setrlimit(2)
- PhilipRoman 3 months ago
  Be careful with the setrlimit/ulimit API family, generally it doesn't do what you want. You can limit virtual memory (but... why?) or specific segments like stack, etc. There is also RLIMIT_RSS which sounds like what you'd want, but alas:
  RLIMIT_RSS This is a limit (in bytes) on the process's resident set (the number of virtual pages resident in RAM). This limit has effect only in Linux 2.4.x, x < 3 and there affects only calls to madvise(2) specifying MADV_WILLNEED.
  I also disagree with the conclusion "No hardware can compensate for a query gone wrong". There are concepts like 'quality of service' and 'fairness' which PG has chosen to not implement.
- nh2 3 months ago
  
  That does not cap memory per query, which should be implemented by postgres explicitly to capture only the memory that really counts towards "the query" (fine-grained memory limits; ulimit is not that).
willrshansen 3 months ago

Seconded

edg5000 3 months ago

Now I want to see that 2TB query! Such a cliffhanger!

Settings

Work_mem: It's a Trap

Keyboard Shortcuts