The PostgreSQL statistics collector generates a lot of very important statistics about the state of the database. If it’s not working, autovacuum doesn’t work, among other problems. But it does generate a lot of write activity, and by default, that goes back onto the database volume.
Instead, always set stats_temp_directory to point to a RAM disk (which has to be owned by the postgres user, with 0600 permissions). The statistics are written back to the database on shutdown, so in normal operations, you won’t lose anything on a reboot. (You’ll lose the most recent statistics on a crash, but you will anyway; the statistics are reset on recovery operations, including restart from a crash.)
This can substantially cut down the amount of write I/O the main database storage volume has to receive, and it’s free!
Comments
Alex · 2 February 2016
What is required RAM disk size for statistics?Xof · 3 February 2016
I generally set it to 16MB, and haven't had issues with running short on space.Stefan · 3 February 2016
Biggest one we have is 8MB of stats data but that's with 2 cluster of 9 & 14 db's so each so quite a high number of db objects to track stats for. Note: Recent debian/ubuntu packaging does put stat_temp already on a ramdisk out of the box (where pid file is also /var/run/postgresql/...).VdP · 3 February 2016
I'd gladly trade 16M of ram for a substantial IO cut. This begs the question, if the default setup doesn't improve crash recovery and only slows things down, why does PG write this to permanent storage to begin with, rather than store it in shared memory ? Just historical reasons, or is there an upside you forgot ? Is that a low-hanging fruit for performance improvement in a future PG version ?Andrew Dunstan · 3 February 2016
Postgres itself runs unprivileged, and on many platforms. So the RAMdisk needs to be set up outside of Postgres' control.Terje Elde · 3 February 2016
Do you have any numbers to back this up? (Asking because I'm curious, not because I'm disputing it) Ideally, any improvement should be minimal, because the stats should not need to be fsync()ed, and if they're not, they should more or less just live in memory even if on a traditional filesystem. Sure, there might be the occasional write, but it shouldn't be anything significant enough to warrant the hassle of a ram disk.Xof · 3 February 2016
Ramdisks are no hassle at all. I think you'll find the performance to be worth. it. Feel free to do some experiments to prove me wrong, of course.VdP · 3 February 2016
@Andrew I wasn't asking why PG doesn't setup a ramdisk itself, but why it stores [data that isn't big and is not recovered after a crash] on (ram/hard/etc) disk rather than in (shared) memory.sd · 7 February 2016
Hi! Is it worth it? How about some real bench charts?Xof · 7 February 2016
Given that it is free, I'd say it is likely to be worth it.