postgresql when it's not your job


Always Do This #4: Put stats_temp_directory on a memory file system

2 February 2016

The PostgreSQL statistics collector generates a lot of very important statistics about the state of the database. If it’s not working, autovacuum doesn’t work, among other problems. But it does generate a lot of write activity, and by default, that goes back onto the database volume.

Instead, always set statstempdirectory to point to a RAM disk (which has to be owned by the postgres user, with 0600 permissions). The statistics are written back to the database on shutdown, so in normal operations, you won’t lose anything on a reboot. (You’ll lose the most recent statistics on a crash, but you will anyway; the statistics are reset on recovery operations, including restart from a crash.)

This can substantially cut down the amount of write I/O the main database storage volume has to receive, and it’s free!

Alex at 23:47, 2 February 2016:

What is required RAM disk size for statistics?

Xof at 00:19, 3 February 2016:

I generally set it to 16MB, and haven’t had issues with running short on space.

Stefan at 01:28, 3 February 2016:

Biggest one we have is 8MB of stats data but that’s with 2 cluster of 9 & 14 db’s so each so quite a high number of db objects to track stats for.

Note: Recent debian/ubuntu packaging does put stat_temp already on a ramdisk out of the box (where pid file is also /var/run/postgresql/…).

VdP at 03:23, 3 February 2016:

I’d gladly trade 16M of ram for a substantial IO cut.

This begs the question, if the default setup doesn’t improve crash recovery and only slows things down, why does PG write this to permanent storage to begin with, rather than store it in shared memory ? Just historical reasons, or is there an upside you forgot ? Is that a low-hanging fruit for performance improvement in a future PG version ?

Andrew Dunstan at 10:21, 3 February 2016:

Postgres itself runs unprivileged, and on many platforms. So the RAMdisk needs to be set up outside of Postgres’ control.

Terje Elde at 12:07, 3 February 2016:

Do you have any numbers to back this up?
(Asking because I’m curious, not because I’m disputing it)
Ideally, any improvement should be minimal, because the stats should not need to be fsync()ed, and if they’re not, they should more or less just live in memory even if on a traditional filesystem.
Sure, there might be the occasional write, but it shouldn’t be anything significant enough to warrant the hassle of a ram disk.

Xof at 13:04, 3 February 2016:

Ramdisks are no hassle at all. I think you’ll find the performance to be worth. it. Feel free to do some experiments to prove me wrong, of course.

VdP at 13:59, 3 February 2016:

@Andrew I wasn’t asking why PG doesn’t setup a ramdisk itself, but why it stores [data that isn’t big and is not recovered after a crash] on (ram/hard/etc) disk rather than in (shared) memory.

sd at 22:39, 7 February 2016:

Hi! Is it worth it? How about some real bench charts?

Xof at 23:53, 7 February 2016:

Given that it is free, I’d say it is likely to be worth it.