postgresql when it's not your job

09:29

Always Do This #1: Create New Clusters with Checksums

17 December 2015

(First in a series of things I always do when setting up or configuring PostgreSQL.)

Since version 9.3, PostgreSQL has had the option to create new database clusters with data checksums. These are fast, simple checksums to verify that the data coming off of disk is what PostgreSQL expects it to be. The checksums don’t correct errors, but it can be a warning that something bad has happened to your data.

Always initialize new clusters with them turned on, unless you are running on a file system (like ZFS) that does checksumming itself. You have uncorrected errors on your disk, so you might as well find them.

Turning them on is the –data-checksums (-k) option to initdb. If you are using Debian packaging, you can set this in the /etc/postgresql-common/createcluster.conf file.

Chris at 10:56, 17 December 2015:

Would be good to see benchmarks explaining how checksums affect performance. I believe some existed when this was committed?

Xof at 10:59, 17 December 2015:

Benchmarks are interesting, but only if you can answer the question a priori: How much of a performance boost is unchecked data corruption worth to us?

Jim Nasby at 13:32, 17 December 2015:

This reminder is especially important, because you can only do this at cluster creation time. We should probably change the default to use checksums.

I would actually enable checksums even on filesystems like ZFS, because the filesystem doesn’t checksum data that’s sitting in cache. With memory density as high as it is today, your odds of undetected bit flips (even with ECC) are far higher than before.

As for performance, I’ve never looked at a system that didn’t have a lot of other low-hanging fruit that would produce a far higher performance gain than checksums will cost you.

Andres Freund at 04:59, 18 December 2015:

> I would actually enable checksums even on filesystems like ZFS, because the filesystem doesn’t checksum data that’s sitting in cache. With memory density as high as it is today, your odds of undetected bit flips (even with ECC) are far higher than before.

Neither does postgres notice checksum failures while in shared buffers.

Evan Jones at 12:38, 27 December 2015:

Yes, yes, yes! Having recently dealt with the fallout from a network corruption issue, I think it is worth having “defense in depth” against bad data. This seems sane, and I have to imagine the cost is pretty tiny to compute and verify the checksums when blocks are moved to and from memory. My related network corruption story is at the following link, with a more detailed description of the final bug to come soon: http://www.evanjones.ca/tcp-and-ethernet-checksums-fail.html