postgresql when it's not your job

15:19

Doing a bulk merge in PostgreSQL 9.5

14 January 2016

I gave a talk earlier this week about the new features in PostgreSQL 9.5. The headline feature, of course, is the new INSERT ... ON CONFLICT clause, aka “Upsert,” which allows an INSERT that has a key conflict to do an UPDATE (or just ignore the insert) instead.

A perfectly reasonable question was: “Does this work on COPY“? And the answer is no, COPY doesn’t have an ON CONFLICT clause. But the answer is actually yes, because you can get these semantics without too much trouble using things you already have around the house.

First, note that INSERT can take a SELECT statement as the source of the records to import:

INSERT INTO t(pk, col1, col2)
   SELECT pk, col1, col2 FROM t2
   ON CONFLICT (pk) DO UPDATE
      SET col1 = EXCLUDED.col1,
          col2 = EXCLUDED.col2;

So, of course, you could COPY the import file into a secondary table like t2 above, and then do the INSERT.

But we don’t have to! We can use the file_fdw foreign data wrapper to “mount” the text file just like it was a local table.

First, we need to create the file_fdw in the database (it is part of the PostgreSQL distribution);

CREATE EXTENSION file_fdw;

… and a “server” to go with it:

CREATE SERVER import_t FOREIGN DATA WRAPPER file_fdw;

… and then “mount” the table in the local database:

CREATE FOREIGN TABLE t2 (
   pk uuid,
   col1 integer,
   col2 text
) SERVER import_t
  OPTIONS ( filename '/path/to/file/t.csv', format 'csv' );

And then we can do the INSERT operation just as above.

So, we have proper MERGE-type operations in PostgreSQL, right out of the box!

Matt at 20:01, 14 January 2016:

You might want to add a ORDER BY to ensure that you get the most current record, unless your CSV file is already sorted by time.