The Build

11 February 2016

20:53

Indexes Are Not Cheap

I’ve noticed an increasing tendency in PostgreSQL users to over-index tables, often constructing very complex partial indexes to try to speed up very particular queries.

Be careful about doing this. Not only do additional indexes increase the plan time, they greatly increase insert time.

By way of example, I created a table with a single bigint column, and populated it with:

time psql -c "insert into t select generate_series(1, 100000000)"

That run without any indexes took 1 minute, 55 seconds; that run with eight indexes on the same table took 26 minutes, 39 seconds, or almost 14 times slower.

Regularly consult pg_stat_user_indexes and drop indexes that aren’t being used. Your disk space and insert times will thank you.

9 February 2016

11:46

Django 1.8/1.9 and PostgreSQL: An Ever-Closer Union

I’ll be speaking about Django and PostgreSQL at PyCon US 2016.

8 February 2016

20:46

PostgreSQL High Availability, 2016 Edition

The slides from my talk at PG Day at FOSDEM 2016 are now available.

2 February 2016

19:59

Always Do This #4: Put stats_temp_directory on a memory file system

The PostgreSQL statistics collector generates a lot of very important statistics about the state of the database. If it’s not working, autovacuum doesn’t work, among other problems. But it does generate a lot of write activity, and by default, that goes back onto the database volume.

Instead, always set statstempdirectory to point to a RAM disk (which has to be owned by the postgres user, with 0600 permissions). The statistics are written back to the database on shutdown, so in normal operations, you won’t lose anything on a reboot. (You’ll lose the most recent statistics on a crash, but you will anyway; the statistics are reset on recovery operations, including restart from a crash.)

This can substantially cut down the amount of write I/O the main database storage volume has to receive, and it’s free!

25 January 2016

22:07

PostgreSQL High Availability, 2016 Edition

I’ll be speaking about PostgreSQL High Availability at PGDay 2016 at FOSDEM.

21 January 2016

18:40

JSON Home Improvement at SCALE 14x

The slides from my presentation, JSON Home Improvement at SCALE 14x, are now available.

13:53

A simple JSON difference function

More as an example than anything else, I wanted a function that would take two JSONB objects in PostgreSQL, and return how the left-hand side differs from the right-hand side. This means any key that is in the left but not in the right would be returned, along with any key whose value on the left is different from the right.

Here’s a quick example of how to do this in a single SELECT. In real life, you probably want more error checking, but it shows how nice the built-in primitives are:

CREATE OR REPLACE FUNCTION json_diff(l JSONB, r JSONB) RETURNS JSONB AS
$json_diff$
    SELECT jsonb_object_agg(a.key, a.value) FROM
        ( SELECT key, value FROM jsonb_each(l) ) a LEFT OUTER JOIN
        ( SELECT key, value FROM jsonb_each(r) ) b ON a.key = b.key
    WHERE a.value != b.value OR b.key IS NULL;
$json_diff$
    LANGUAGE sql;

14 January 2016

15:19

Doing a bulk merge in PostgreSQL 9.5

I gave a talk earlier this week about the new features in PostgreSQL 9.5. The headline feature, of course, is the new INSERT ... ON CONFLICT clause, aka “Upsert,” which allows an INSERT that has a key conflict to do an UPDATE (or just ignore the insert) instead.

A perfectly reasonable question was: “Does this work on COPY“? And the answer is no, COPY doesn’t have an ON CONFLICT clause. But the answer is actually yes, because you can get these semantics without too much trouble using things you already have around the house.

First, note that INSERT can take a SELECT statement as the source of the records to import:

INSERT INTO t(pk, col1, col2)
   SELECT pk, col1, col2 FROM t2
   ON CONFLICT (pk) DO UPDATE
      SET col1 = EXCLUDED.col1,
          col2 = EXCLUDED.col2;

So, of course, you could COPY the import file into a secondary table like t2 above, and then do the INSERT.

But we don’t have to! We can use the file_fdw foreign data wrapper to “mount” the text file just like it was a local table.

First, we need to create the file_fdw in the database (it is part of the PostgreSQL distribution);

CREATE EXTENSION file_fdw;

… and a “server” to go with it:

CREATE SERVER import_t FOREIGN DATA WRAPPER file_fdw;

… and then “mount” the table in the local database:

CREATE FOREIGN TABLE t2 (
   pk uuid,
   col1 integer,
   col2 text
) SERVER import_t
  OPTIONS ( filename '/path/to/file/t.csv', format 'csv' );

And then we can do the INSERT operation just as above.

So, we have proper MERGE-type operations in PostgreSQL, right out of the box!

4 January 2016

21:53

Always Do This #3: Log Lock Waits and Temp Files

How much and what to log in PostgreSQL is something that doesn’t really admit a single solution. Logging every connection can swamp a server, as can too low a setting of log_min_statement_duration. But there are two settings I always turn on: log_lock_waits, and log_temp_files (with logtempfiles being set to 0).

log_lock_waits will log any lock wait that goes on longer than the deadlock_timeout setting (the same process that checks for deadlocks also emits the log message). By default, this is one second, and if my database has a lock on which a process is waiting for that long, I want to know about it.

log_temp_files is a size over which a log message will be generated for the temp file creation. Zero is everything. Temp files are (almost always) bad; any time the system needs to do something on disk instead of in memory, I want to know about it.

So, just turn these on, and feed the results through pgbadger. If you are getting so many lock waits or temp files that you are getting excessive logging, the problem is the waits or the temp files, not the logging!

19:39

January SFPUG: What’s New in PostgreSQL 9.5?

I’ll be presenting at the January San Francisco PostgreSQL Users’ Group meeting on what’s new in PostgreSQL 9.5. I hope you can join us!

« Older Entries