10:06
Three Steps to pg_rewind Happiness
9 August 2018
pg_rewind
is a utility included with PostgreSQL since 9.x. It’s used to “rewind” a server so that it can be attached as a secondary to a primary. The server being rewound could be the former primary of the new primary, or a secondary that was a peer of the new primary.
In pg_rewind
terminology, and in this post, the “source” server is the new primary that the old server is going to be attached to, and the “target” is the server that will be attached to the source as a secondary.
Step One: Have a WAL Archive
While pg_rewind
does not require that you have a WAL archive, you should have one. pg_rewind
works by “backing up” the target server to a state before the last shared checkpoint of the two servers. Then, when the target starts up, it uses WAL information to replay itself to the appropriate point at which it can connect as a streaming replica to the source. To do that, it needs the WAL information from the rewind point onwards. Since the source had no reason to “know” that it would be used as a primary, it may not have enough WAL information in its pgxlog / pgwal directory to bring the target up to date. If it doesn’t, you are back to rebuilding the new secondary, the exact situation that pg_rewind
is meant to avoid.
Thus, make sure you have a WAL archive that the target can consult as it is coming up.
Step Two: Properly Promote the Source Server
The source server, which will be the new primary, needs to be properly promoted. Use the pg_ctl promote
option, or the trigger_file
option in recovery.conf
so that the source promotes itself, and starts a new timeline. Don’t just shut the source down, remove recovery.conf, and bring it back up! That doesn’t create a new timeline, and the source won’t have the appropriate divergence point from the target for pg_rewind
to consult.
Step Three: Wait for the Forced Checkpoint to Complete
When a secondary is promoted to being a primary, it starts a forced checkpoint when it exits recovery mode. This checkpoint is a “fast” checkpoint, but it can still take a while, depending on how big shared_buffers
is and how many buffers are dirty. Use tail -f
to monitor the logs on the source and wait for that forced checkpoint to complete before running pg_rewind
to rewind the target. Failing to do this can cause the target to be corrupted. If you are writing a script to do this, issue a CHECKPOINT
statement to the source before running pg_rewind
.
And have fun rewinding servers!
There are no comments yet.