Everything you know about setting `work_mem` is wrong.
13 March 2023
If you google around for how to set
work_mem in PostgreSQL, you’ll probably find something like:
To set work_mem, take the number of connections, add 32, divide by your astrological sign expressed as a number (Aquarius is 1), convert it to base 7, and then read that number in decimal megabytes.
So, I am here to tell you that every formula setting
work_mem is wrong. Every. Single. One. They may not be badly wrong, but they are at best first cuts and approximations.
The problem is that of all the parameters you can set in PostgreSQL,
work_mem is about the most workload dependent. You are trying to balance two competing things:
- First, you want to set it high enough that PostgreSQL does as many of the operations as it can (generally, sorts and sort-adjacent operations) in memory rather than on secondary storage, since it’s much faster to do them in memory, but:
- You want it to be low enough that you don’t run out of memory while you are doing these things, because the query will then get canceled unexpectedly and, you know, people talk.
You can prevent the second situation with a formula. For example, you can use something like:
50% of free memory + file system buffers divided by the number of connections.
The chance of running out of memory using that formula is very low. It’s not zero, because a single query can use more than
work_mem if there are multiple execution nodes demanding it in a query, but that’s very unlikely. It’s even less likely that every connection will be running a query that has multiple execution nodes that require full
work_mem; the system will have almost certainly melted down well before that.
The problem with using a formula like that is that you are, to mix metaphors, leaving RAM on the table. For example, on a 48GB server with
max_connections = 1000, you end up with with a
work_mem in the 30MB range. That means that a query that needs 64MB, even if it is the only one on the system that needs that much memory, will be spilled to disk while there’s a ton of memory sitting around available.
So, here’s what you do:
- Use a formula like that to set
work_mem, and then run the system under a realistic production load with
log_temp_files = 0set.
- If everything works fine and you see no problems and performance is 100% acceptable, you’re done.
- If not, go into the logs and look for temporary file creation messages. They look something like this:
2023-03-13 13:19:03.863 PDT,,,45466,,640f8503.b19a,1,,2023-03-13 13:18:11 PDT,6/28390,0,LOG,00000,"temporary file: path ""base/pgsql_tmp/pgsql_tmp45466.0"", size 399482880",,,,,,"explain analyze select f from t order by f;",,,"psql","parallel worker",44989,0
- If there aren’t any, you’re done, the performance issue isn’t temporary file creation.
- If there are, the setting for
work_memto get rid of them is 2 times the largest temporary file (temporary files have less overhead than memory operations).
Of course, that might come up with something really absurd, like 2TB. Unless you know for sure that only one query like that might be running at a time (and you really do have enough freeable memory), you might have to make some decisions about performance vs memory usage. It can be very handy to run the logs throughs through an analyzer like
pgbadger to see what the high water mark is for temporary file usage at any one time.
If you absolutely must use a formula (for example, you are deploying a very large fleet of servers with varying workloads and instance sizes and you have to put something in the Terraform script), we’ve had good success with:
(average freeable memory * 4) / max_connections
But like every formula, that’s at best an approximation. If you want an accurate number that maximizes performance without causing out-of-memory issues, you have to gather data and analyze it.
Sorry for any inconvenience.
There are no comments yet.