postgresql when it's not your job

13:27

Everything you know about setting `work_mem` is wrong.

13 March 2023

If you google around for how to set work_mem in PostgreSQL, you’ll probably find something like:

To set work_mem, take the number of connections, add 32, divide by your astrological sign expressed as a number (Aquarius is 1), convert it to base 7, and then read that number in decimal megabytes.

So, I am here to tell you that every formula setting work_mem is wrong. Every. Single. One. They may not be badly wrong, but they are at best first cuts and approximations.

The problem is that of all the parameters you can set in PostgreSQL, work_mem is about the most workload dependent. You are trying to balance two competing things:

You can prevent the second situation with a formula. For example, you can use something like:

50% of free memory + file system buffers divided by the number of connections.

The chance of running out of memory using that formula is very low. It’s not zero, because a single query can use more than work_mem if there are multiple execution nodes demanding it in a query, but that’s very unlikely. It’s even less likely that every connection will be running a query that has multiple execution nodes that require full work_mem; the system will have almost certainly melted down well before that.

The problem with using a formula like that is that you are, to mix metaphors, leaving RAM on the table. For example, on a 48GB server with max_connections = 1000, you end up with with a work_mem in the 30MB range. That means that a query that needs 64MB, even if it is the only one on the system that needs that much memory, will be spilled to disk while there’s a ton of memory sitting around available.

So, here’s what you do:

  1. Use a formula like that to set work_mem, and then run the system under a realistic production load with log_temp_files = 0 set.
  2. If everything works fine and you see no problems and performance is 100% acceptable, you’re done.
  3. If not, go into the logs and look for temporary file creation messages. They look something like this: 2023-03-13 13:19:03.863 PDT,,,45466,,640f8503.b19a,1,,2023-03-13 13:18:11 PDT,6/28390,0,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp45466.0"", size 399482880",,,,,,"explain analyze select f from t order by f;",,,"psql","parallel worker",44989,0
  4. If there aren’t any, you’re done, the performance issue isn’t temporary file creation.
  5. If there are, the setting for work_mem to get rid of them is 2 times the largest temporary file (temporary files have less overhead than memory operations).

Of course, that might come up with something really absurd, like 2TB. Unless you know for sure that only one query like that might be running at a time (and you really do have enough freeable memory), you might have to make some decisions about performance vs memory usage. It can be very handy to run the logs throughs through an analyzer like pgbadger to see what the high water mark is for temporary file usage at any one time.

If you absolutely must use a formula (for example, you are deploying a very large fleet of servers with varying workloads and instance sizes and you have to put something in the Terraform script), we’ve had good success with:

(average freeable memory * 4) / max_connections

But like every formula, that’s at best an approximation. If you want an accurate number that maximizes performance without causing out-of-memory issues, you have to gather data and analyze it.

Sorry for any inconvenience.

Pavel Stehule at 00:27, 14 March 2023:

I use similar formula

SharedBuffers + work_mem * 2 * max_connection + (memory for file system and operation system) < RAM