The Build

30 October 2017

14:00

Commitment Issues

One of the essentials of any database system is that a transaction is either in progress, committed, or rolled back. But consider what happens if…

BEGIN;
UPDATE table SET money=money+100000;
COMMIT;
-- And you get an error that the server has disconnected

Did that transaction commit, or didn’t it? You don’t really know, because you don’t know if the server even received the COMMIT statement; if it didn’t, the transaction might have aborted due to the client disconnection.

This isn’t that important for the vast majority of situations, but if you are (for example) building a two-phase commit architecture, it might be vital to know if the transaction really did commit or not.

In PostgreSQL, you can find this out using txid_current() and txid_status():

BEGIN;
UPDATE table SET money=money+100000;
SELECT txid_current();

 txid_current 
--------------
         8168
(1 row)
COMMIT;
-- And you get an error that the server has disconnected

Now, you can use that result to test the status of that transaction (as long as it is still available in the commit log). So, if a disconnection occurs, the client can reconnect and:

SELECT txid_status(8168);

 txid_status 
-------------
aborted
(1 row)

Now, the client knows it needs to rerun that transaction.

Again, most client applications don’t need this level of detail, but it’s great to have the capability if you do.

27 October 2017

01:29

Streaming replication stopped? One more thing to check.

We recently were asked by a client to look at a strange replication situation. Replication had simply stopped from a primary to a streaming replication secondary. Everything looked fine:

After investigation, and some back-and-forth on mailing lists, we discovered the problem was a data ingestion procedure. At the highest level, the stored procedure created a temporary table, filled it, did some data analysis on it, and then inserted the results into a permanent table. Two changes had been made just before the replication stopped:

  1. The number of temporary tables created in each “run” of the stored procedure had been increased from 10 to 20,000.
  2. A bug had been introduced that caused an exception within the loop, but that exception had been caught and discarded.

Logically, what was happening resembled this:

DO $$
DECLARE
  i int := 1;
BEGIN
  FOR i IN 1..20000 LOOP
     BEGIN
         PERFORM f();
         i := i / 0;
     EXCEPTION
        WHEN division_by_zero THEN
     END;
  END LOOP;
END;
$$ language plpgsql;

where f() is:

CREATE OR REPLACE FUNCTION f() RETURNS VOID AS $$
BEGIN
 CREATE TEMPORARY TABLE test_table 
   ON COMMIT DROP AS SELECT i FROM generate_series(1, 100) i;
END:
$$ language plpgsql;

So, what happened?

Each time PostgreSQL creates a temporary table (or any table, for that), it takes an ExclusiveLock on it. That lock is held until the end of the transaction. In this case, each time the exception was raised, that rolled back a savepoint that PL/pgSQL implicitly creates at the start of the BEGIN/END, releasing the lock.

But only on the primary.

The secondary received the lock (one of the only types of lock a secondary can have), and held it until the entire wrapping transaction commits or rolls back. This is a lto of locks in this case. PostgreSQL handles this situation, but not entirely gracefully: the time needed up clean up all those locks when the COMMIT arrives at the secondary appears to be O(N^2), N being the number of locks.

Reducing the number of iterations (and fixing the bug) caused the issue to go away.

Interestingly, we’ve not been able to reproduce in a simple test case the lack-of-error-messages; in the test case above, all kinds of errors are emitted on the secondary.

Special thanks to Tom Lane for helping track this down.

01:06

“Humans do not have a primary key” at PGConf.EU

The slides for “Humans do not have a primary key” as presented at PGConf EU are available here. Three links from the presentation are:

7 September 2017

10:19

“Humans do not have a primary key” at PGOpen SV

I’ll be presenting “Humans do not have a primary key” at PGOpen SV today. Two links from the presentation are:

5 September 2017

08:51

A brief reminder about casting TIMESTAMPTZ

I spent a few minutes worrying I had lost my mind: Results coming back to my application were different from those obtained by running psql directly on the database server, for identical queries.

As is often the case, the problem was time zones.

In particular, when casting a TIMESTAMPTZ to a DATE, the current client time zone setting is used. So:

xof=# BEGIN;
BEGIN
xof=# set time zone 'UTC';
SET
xof=# select now()::date;
    now     
------------
 2017-09-05
(1 row)

xof=# set time zone 'Australia/Lord_Howe';
SET
xof=# select now()::date;
    now     
------------
 2017-09-06
(1 row)

xof=# COMMIT;
COMMIT

(Extra credit for knowing why I wrapped this example in a transaction block!)

Just a reminder to be very careful when converting items from time zone-aware formats (like TIMESTAMPTZ) to non-aware ones (like TIMESTAMP or DATE). And always store data as a time zone-aware type, unless there is a very compelling reason to do otherwise.

4 September 2017

01:18

Blog Moved.

This is probably no big deal to anyone, but the blog has moved to a new server. Whee!

9 June 2017

11:27

shared_buffers is not a sensitive setting

There’s a lot of mystery-cult information floating around out there about what to set shared_buffers to in PostgreSQL. Most of it is, at best, overthinking a fairly simple setting.

You can get 90% of the way there, on 95%+ of PostgreSQL systems, just by setting it to 25% of total system memory, to a maximum of 32GB. (It used to be 8GB, and then 16GB; as processors get faster, the number tends to go up.)

In particular, the difference between 24GB and 32GB, for example, will almost certainly be unmeasurable, on a system with 128GB or more memory.

It can be of benefit to lower shared_buffers on systems that have a high write rate, to reduce the total amount of data flushed on one checkpoint. I don’t have analytic values for that (shame on me!), but I’ve seen benefits by reducing it to as low as 2GB on a very high write-rate system.

But don’t agonize.

7 June 2017

15:03

Resetting the postgres user’s password

For some reason, this is something that even experienced PostgreSQL people don’t know about, which is: What do you do if you’ve forgotten the postgres user’s password, and you have no other superuser available?

The answer is: Bring up PostgreSQL in single-user mode, reset the password, bring it back up in standard mode.

Of course, this requires access to the shell on the machine running PostgreSQL.

28 April 2017

09:00

A PostgreSQL Response to Uber

The slides from my talk at Percona Live 2017, A PostgreSQL Response to Uber, which I also gave as “Why Uber Was (Mostly) Wrong” at PGDay Nordic 2017, are now available.

31 March 2017

09:34

“Corruption War Stories” from PGConf US 2017

The slides from my presentation, Corruption War Stories, are now available.

« Older Entries