17 July 2018
Google recently released a tool to check on-disk checksums in PostgreSQL. PostgreSQL being hot, and Google being Google, this generated a lot of press, much of it containing an element of FUD about the integrity of PostgreSQL backups (note that Google’s own announcements were quite straight-forward about it).
First, some background: Since PostgreSQL 9.3, it has had page checksums available as an option when initializing a new database. These are fast, lightweight checksums to verify that the data on the page hasn’t been corrupted. It’s an excellent and low-impact data integrity feature, and highly recommended.
However, PostgreSQL doesn’t just randomly check the checksums of pages it is not going to read into memory (and why would it?). Thus, there can be lurking corruption in database that you don’t discover until too late… specifically, after all of the backups also contain it. Some PostgreSQL page-level backup tools, such as pgBackRest, check the checksums during the backup, but other tools (such as WAL-E and WAL-G) do not.
Google’s tool adds to that collection by doing an on-disk scan of each page, and verifying the checksum. That’s all it does. That’s not useless! It’s relevant to backups because doing this scan on a backup image doesn’t create the large I/O burden that doing so on an active server would.
So, if you are using a backup tool that does not verify backups, and that creates a disk image (as opposed to say, pg_dump) by all means consider this new addition as part of your toolbox. But there’s nothing inherent about PostgreSQL backups that makes them more prone to corruption, and some of the unfortunate press around this has given that implication.