UUID vs BIGSERIAL for Primary Keys
8 October 2015
SERIAL (32 bit integer) or BIGSERIAL (64 bit integer) are the first choice for most people for a synthetic primary key. They’re easy, they’re comprehensible, and they’re transaction-safe. The values that come out of them are, at least to start, manageable and human-scale. They can also provide an easy sortation on creation order.
They’re not perfect, though: If you have to merge together two tables that were generated using SERIALs, you have a massive key update ahead of you to avoid conflicts. If you use SERIAL, exhausting the range is a possibility. If you have a sharded database, you need some way of keeping the sequences independent, such as different starting offsets (but what if you get the offset wrong?) or creating them using different increments (but what if you add another server)
A good alternative is using UUIDs, generated using the
uuid_generate_v4() function in PostgreSQL’s uuid-ossp contrib module. This makes mergers much easier, and guarantees independence across multiple tables.
But UUIDs are 128 bits, not 64, and require a function call to generate. How much of a problem is that, really?
As a test, I created a table with a single primary key column and a single float field:
CREATE TABLE test( id <type>, f float, PRIMARY KEY (id) )
<type> could be one of three possibilities:
- UUID, using
- BIGINT, using the
next_idfunction from Instagram.
The test inserted 10,000,000 rows into the table. In one run, it did a COMMIT after each INSERT; in the other, a single COMMIT after all INSERTs. This was on PostgreSQL 9.4.4 on an AWS i2.2xlarge instance, with the two SSDs in a RAID-0 as the database volume.
The results were:
COMMITing after each INSERT:
|column type||time (s)||size (MB)|
|column type||time (s)||size (MB)|
Overall, the INSERT time for UUIDs was slightly longer than that for BIGSERIAL, but not appreciably. The BIGINT column was notably slower, due to the PL/pgSQL function generating the new keys.
The UUID tables were bigger, of course, although this is an extremely example in that the primary key was only one of two fields in the table; more realistic tables with more columns would not show the same percentage increase.
The conclusion I draw is that it is fine to use UUIDs unless you are faced with a very tight INSERT performance requirement; they are surprisingly efficient compared to BIGSERIAL. My supposition is that the increased computation for the UUID is balanced agains the I/O to maintain the SERIAL.
There are 10 comments.
eMerzh at 13:43, 8 October 2015:
although it might be usefull to check index size and select speed too :)
Xof at 13:45, 8 October 2015:
The size above includes the primary key index.
Arthur at 15:24, 8 October 2015:
I guess an UUIDv1 can ameliorate some of the overhead of UUIDv4 since it’s always increasing like the SERIALs.
koala at 22:23, 8 October 2015:
How about join performance?
Dmitry at 03:39, 9 October 2015:
Also, it would be nice to compare gen_random_uuid() from pgcrypto vs. uuid_generate_v4() from uuid-ossp
uuid-ossp is nice extension, but may be tricky to install on exotic Linux or *BSD distros.
Royce at 03:58, 9 October 2015:
I expect performance will drop significantly when the index no longer fits in memory though, seeing as v4 uuids are random
douglarek at 22:35, 9 October 2015:
Thanks，It is a good alternative.
David Kitchen at 09:20, 14 October 2015:
I haven’t checked, but does PostgreSQL allow for composite primary keys for one-to-many join tables when the primary keys are UUIDs?
Do you have examples of this working?
Xof at 09:49, 14 October 2015:
UUIDs are normal types; they can be used as keys anywhere. I use them all the time in many-to-many tables, for example.
Sameer Kumar at 11:50, 14 October 2015:
Interesting! Unless I am missing something these tests were with single connection, how about testing with 10 pararallel sessions and each trying to insert 1000 rows?