postgresql when it's not your job

22:25

Cleaning up after your Bucardo goats

27 September 2011

If you are not familiar with it already, Bucardo is a nifty multi-master replication system for PostgreSQL, written by Greg Sabino Mullane. Written in Perl, it is great if you need replication that doesn’t have the restrictions associated with PG 9’s streaming replication.

To keep your Bucardo installation clean and tidy, a few regular cron jobs are required. One of them cleans up the archived replicated data (stored in a separate database by Bucardo) once you know you are done with it.

The Bucardo page above has a recommended script using all sorts of bashing, but I wanted something a bit more pure-PostgreSQL; it also doesn’t purge more than one old table at a time. So, I whipped up the following PL/pgSQL function.

(Note that this is for Bucardo 4.4. I haven’t played with the forthcoming Bucardo 5, so I’m not sure if this is still required.)

CREATE OR REPLACE FUNCTION bucardo.purge_frozen_child_qs(far_back interval)
    RETURNS SETOF TEXT AS
$purge_frozen_child_qs$
DECLARE
    t TEXT;
    qt TEXT;
BEGIN

    IF far_back IS NULL THEN
        RAISE EXCEPTION 'Interval cannot be null.'
            USING HINT = 'So, do not do that.';
    END IF;

    IF (now() + far_back) > now() THEN
        RAISE EXCEPTION 'Interval must be negative.'
            USING HINT = 'Consider using the "ago" form of intervals.';
    END IF;

    FOR t IN 
        SELECT tablename 
            FROM pg_tables
            WHERE schemaname='freezer' 
                  AND tablename like 'child_q_%' 
                  AND (replace(tablename, 'child_q_', '')::timestamp with time zone) < now() + far_back::interval
            ORDER BY tablename
    LOOP
        qt := 'freezer.' || t;
        EXECUTE 'DROP TABLE ' || qt;
        RETURN NEXT qt;
    END LOOP;

    DELETE FROM bucardo.q 
        WHERE (started < now() + far_back::interval 
                OR ended < now() + far_back::interval 
                OR aborted < now() + far_back::interval 
                OR cdate < now() + far_back::interval) 
              AND (ended IS NULL OR aborted IS NULL);

    RETURN;

END
$purge_frozen_child_qs$
LANGUAGE plpgsql
    VOLATILE;

To use it, just call it repeatedly from a cron job with the appropriate argument, along the lines of:

SELECT * FROM bucardo.purge_frozen_child_qs('7 days ago'::interval);

It returns the names of the tables it deleted.

This particular function doesn’t need to be run more often than once a day. And it keeps your Bucardo goats nice and clean.

(A “bucardo” is a now-extinct species of goat. For why Bucardo is goat-related, ask Greg.)

Greg at 09:51, 28 September 2011:

Thanks, but I didn’t originally write it, Jon Jensen did. But I’ve certainly modified the original version a bit. You are correct to ask about Bucardo 5 – the function won’t be needed, as both the q and the child_q tables are no more. Although bucardo_purge_delta() still needs to be run, the MCP will spawn a process to handle that for you in Bucardo 5. Which means that you should not need any maintenance cronjobs at all.