15:19
Doing a bulk merge in PostgreSQL 9.5
14 January 2016
I gave a talk earlier this week about the new features in PostgreSQL 9.5. The headline feature, of course, is the new INSERT ... ON CONFLICT
clause, aka “Upsert,” which allows an INSERT that has a key conflict to do an UPDATE
(or just ignore the insert) instead.
A perfectly reasonable question was: “Does this work on COPY
“? And the answer is no, COPY
doesn’t have an ON CONFLICT
clause. But the answer is actually yes, because you can get these semantics without too much trouble using things you already have around the house.
First, note that INSERT
can take a SELECT
statement as the source of the records to import:
INSERT INTO t(pk, col1, col2)
SELECT pk, col1, col2 FROM t2
ON CONFLICT (pk) DO UPDATE
SET col1 = EXCLUDED.col1,
col2 = EXCLUDED.col2;
So, of course, you could COPY
the import file into a secondary table like t2
above, and then do the INSERT
.
But we don’t have to! We can use the file_fdw
foreign data wrapper to “mount” the text file just like it was a local table.
First, we need to create the file_fdw in the database (it is part of the PostgreSQL distribution);
CREATE EXTENSION file_fdw;
… and a “server” to go with it:
CREATE SERVER import_t FOREIGN DATA WRAPPER file_fdw;
… and then “mount” the table in the local database:
CREATE FOREIGN TABLE t2 (
pk uuid,
col1 integer,
col2 text
) SERVER import_t
OPTIONS ( filename '/path/to/file/t.csv', format 'csv' );
And then we can do the INSERT
operation just as above.
So, we have proper MERGE-type operations in PostgreSQL, right out of the box!
There is one comment.
Matt at 20:01, 14 January 2016:
You might want to add a ORDER BY to ensure that you get the most current record, unless your CSV file is already sorted by time.