The Build

8 October 2015


UUID vs BIGSERIAL for Primary Keys

SERIAL (32 bit integer) or BIGSERIAL (64 bit integer) are the first choice for most people for a synthetic primary key. They’re easy, they’re comprehensible, and they’re transaction-safe. The values that come out of them are, at least to start, manageable and human-scale. They can also provide an easy sortation on creation order.

They’re not perfect, though: If you have to merge together two tables that were generated using SERIALs, you have a massive key update ahead of you to avoid conflicts. If you use SERIAL, exhausting the range is a possibility. If you have a sharded database, you need some way of keeping the sequences independent, such as different starting offsets (but what if you get the offset wrong?) or creating them using different increments (but what if you add another server)

A good alternative is using UUIDs, generated using the uuid_generate_v4() function in PostgreSQL’s uuid-ossp contrib module. This makes mergers much easier, and guarantees independence across multiple tables.

But UUIDs are 128 bits, not 64, and require a function call to generate. How much of a problem is that, really?

As a test, I created a table with a single primary key column and a single float field:

   id <type>,
   f float,

<type> could be one of three possibilities:

The test inserted 10,000,000 rows into the table. In one run, it did a COMMIT after each INSERT; in the other, a single COMMIT after all INSERTs. This was on PostgreSQL 9.4.4 on an AWS i2.2xlarge instance, with the two SSDs in a RAID-0 as the database volume.

The results were:

COMMITing after each INSERT:

column typetime (s)size (MB)


column typetime (s)size (MB)

Overall, the INSERT time for UUIDs was slightly longer than that for BIGSERIAL, but not appreciably. The BIGINT column was notably slower, due to the PL/pgSQL function generating the new keys.

The UUID tables were bigger, of course, although this is an extremely example in that the primary key was only one of two fields in the table; more realistic tables with more columns would not show the same percentage increase.

The conclusion I draw is that it is fine to use UUIDs unless you are faced with a very tight INSERT performance requirement; they are surprisingly efficient compared to BIGSERIAL. My supposition is that the increased computation for the UUID is balanced agains the I/O to maintain the SERIAL.

10 September 2015


Beyond the B-Tree: GIST and GIN Indexes

I was honored to be invited to give a presentation at the Austin PostgreSQL Users’ Group Meetup, and the slides for my presentation Beyond the B-Tree are now available.


Django 1.8 and PostgreSQL at Djangocon 2015

My slides for Django 1.8 and PostgreSQL are available.

28 March 2015


PostgreSQL and JSON: 2015

The slides from my talk at PGConf US 2015 are now available.

3 February 2015


Logical Decoding and JSON Talks at FOSDEM

The slides for my talks on logical decoding and the state of the art in JSON are now available on-line.

18 November 2014


When LIMIT attacks

One common source of query problems in PostgreSQL results an unexpectedly-bad query plan when a LIMIT clause is included in a query. The typical symptom is that PostgreSQL picks an index-based plan that actually takes much, much longer than if a different index, or no index at all, had been used.

Here’s an example. First, we create a simple table and an index on it:

xof=# CREATE TABLE sample (
xof(#   i INTEGER,
xof(#   f FLOAT
xof(# );
xof=# CREATE INDEX ON sample(f);

And fill it with some data:

xof=# INSERT INTO sample SELECT 0, random() FROM generate_series(1, 10000000);
INSERT 0 10000000
xof=# ANALYZE;

Then, for about 5% of the table, we set i to 1:

xof=# UPDATE sample SET i=1 WHERE f<0.05;
UPDATE 499607
xof=# ANALYZE;

Now, let’s find all of the entires where i is 1, in descending order of f.

                                                         QUERY PLAN                                                         
 Sort  (cost=399309.76..401406.04 rows=838509 width=12) (actual time=1415.166..1511.202 rows=499607 loops=1)
   Sort Key: f
   Sort Method: quicksort  Memory: 35708kB
   ->  Seq Scan on sample  (cost=0.00..316811.10 rows=838509 width=12) (actual time=1101.836..1173.262 rows=499607 loops=1)
         Filter: (i = 1)
         Rows Removed by Filter: 9500393
 Total runtime: 1542.529 ms
(7 rows)

So, 1.5 seconds to do a sequential scan on the whole table. So, just getting the first 10 entries from that should be much faster, right?

                                                                        QUERY PLAN                                                                        
 Limit  (cost=0.43..277.33 rows=10 width=12) (actual time=12710.612..12710.685 rows=10 loops=1)
   ->  Index Scan Backward using sample_f_idx on sample  (cost=0.43..23218083.52 rows=838509 width=12) (actual time=12710.610..12710.682 rows=10 loops=1)
         Filter: (i = 1)
         Rows Removed by Filter: 9500393
 Total runtime: 12710.714 ms
(5 rows)

Oh. 12.7 seconds. What happened?

PostgreSQL doesn’t keep correlated statistics about columns; each column’s statistics are kept independently. Thus, PostgreSQL made an assumption about the distribution of values of i in the table: they were scattered more or less evenly throughout. Thus, walking the index backwards meant that, to get 10 “hits,” it would have to scan about 100 index entries… and the index scan would be a big win.

It was wrong, however, because all of the i=1 values were clustered right at the beginning. If we reverse the order of the scan, we can see that was a much more efficient plan:

                                                               QUERY PLAN                                                                
 Limit  (cost=0.43..277.33 rows=10 width=12) (actual time=0.029..0.046 rows=10 loops=1)
   ->  Index Scan using sample_f_idx on sample  (cost=0.43..23218083.52 rows=838509 width=12) (actual time=0.027..0.044 rows=10 loops=1)
         Filter: (i = 1)
 Total runtime: 0.071 ms
(4 rows)

So, what do we do? There’s no way of telling PostgreSQL directly to pick one plan over the other. We could just turn off index scans for the query, but that could well have bad side effects.

In this particular case, where a predicate (like the WHERE i=1) picks up a relatively small number of entries, we can use a Common Table Expression, or CTE. Here’s the example rewritten using a CTE:

xof-# WITH inner_query AS (
xof(#    SELECT * FROM sample WHERE i=1 
xof(# )
xof-# SELECT * FROM inner_query ORDER BY f  LIMIT 10;
                                                              QUERY PLAN                                                              
 Limit  (cost=351701.16..351701.18 rows=10 width=12) (actual time=1371.946..1371.949 rows=10 loops=1)
   CTE inner_query
     ->  Seq Scan on sample  (cost=0.00..316811.10 rows=838509 width=12) (actual time=1168.034..1244.785 rows=499607 loops=1)
           Filter: (i = 1)
           Rows Removed by Filter: 9500393
   ->  Sort  (cost=34890.06..36986.33 rows=838509 width=12) (actual time=1371.944..1371.944 rows=10 loops=1)
         Sort Key: inner_query.f
         Sort Method: top-N heapsort  Memory: 25kB
         ->  CTE Scan on inner_query  (cost=0.00..16770.18 rows=838509 width=12) (actual time=1168.040..1325.496 rows=499607 loops=1)
 Total runtime: 1381.472 ms
(10 rows)

A CTE is an “optimization fence”: The planner is prohibited from pushing the ORDER BY or LIMIT down into the CTE. In this case, that means that it is also prohibited from picking the index scan, and we’re back to the sequential scan.

So, when you see a query come completely apart, and it has a LIMIT clause, check to see if PostgreSQL is guessing wrong about the distribution of data. If the total number of hits before the LIMIT are relatively small, you can often use a CTE to isolate that part, and only apply the LIMIT thereafter. (Of course, you might be better off just doing the LIMIT operation in your application!)

24 October 2014


“Be Very Afraid: Backup and Disaster Planning” at PGConf.EU

Slides from my talk, Be Very Afraid: Backup and Disaster Planning, are now available.

23 October 2014


“Finding and Repairing Database Corruption” at PGConf.EU

Slides from my talk, Finding and Repairing Database Corruption, are now available.

22 October 2014


Why I prefer Python to Java, in two code samples.

Assignment: Load a remote URL via https, verifying the site certificate against a local root CA certificate.

Java, Android SDK:

// Load CAs from an InputStream
// (could be from a resource or ByteArrayInputStream or ...)
CertificateFactory cf = CertificateFactory.getInstance("X.509");
// From
InputStream caInput = new BufferedInputStream(new FileInputStream("load-der.crt"));
Certificate ca;
try {
    ca = cf.generateCertificate(caInput);
    System.out.println("ca=" + ((X509Certificate) ca).getSubjectDN());
} finally {

// Create a KeyStore containing our trusted CAs
String keyStoreType = KeyStore.getDefaultType();
KeyStore keyStore = KeyStore.getInstance(keyStoreType);
keyStore.load(null, null);
keyStore.setCertificateEntry("ca", ca);

// Create a TrustManager that trusts the CAs in our KeyStore
String tmfAlgorithm = TrustManagerFactory.getDefaultAlgorithm();
TrustManagerFactory tmf = TrustManagerFactory.getInstance(tmfAlgorithm);

// Create an SSLContext that uses our TrustManager
SSLContext context = SSLContext.getInstance("TLS");
context.init(null, tmf.getTrustManagers(), null);

// Tell the URLConnection to use a SocketFactory from our SSLContext
URL url = new URL("");
HttpsURLConnection urlConnection =
InputStream in = urlConnection.getInputStream();
copyInputStreamToOutputStream(in, System.out);


import requests
x = requests.get('', verify='load-der.crt')

27 August 2014


Why I Resigned as Djangocon US 2014 Program Chair

This is long, and involves drama. If you are not interested in the drama, I encourage you to skip it, and have a wonderful time at Djangocon US 2014.

I recently resigned as the program chair for Djangocon US 2014.

Up until now, I had been intending to simply wish Djangocon the best of success, thank Jeff Triplett for standing up to take over, and leave the matter at that.

Now, Steve Holden (who runs Open Bastion, the company which puts on Djangocon US, but not DJangocon EU or other Django-related events) has decided to, in private email to third parties, cast aspersions on my behavior and probity, so I feel it is important to set the record straight.

First, let me be completely blunt: In my opinion, which I feel the facts support, the reason this year’s Djangocon has become a clusterfuck is because, for reasons I do not understand, Steve Holden was unable to do the things a conference organizer is expected to do in a timely fashion.

Steve and I had a telephone conversation on March 13 of this year, in which he asked how long it would take me to finish the CFP. I told him that actually drafting a CFP would take minutes (I was driving down I-5 at the time, and I could have pulled over at a Starbucks and prepared it), but I was unwilling to put it up until there was a website ready to take submissions. This was based on last year’s problems (in which the web site went up late), and simple human nature: Announcing “CFP coming soon” over and over again simply makes everyone forget that it is even happening.

On that conversation, we agreed the site would go up promptly.

It did not launch until late May (or was it early June? I was not actually notified of the launch). Until then, the site still showed the 2013 information. The same was true of the Twitter account.

My company sponsored the speaker’s dinner last year. We did not receive sponsorship information for this year until June 4th.

Steve is fond of comparing Djangocon to PyCon US. I’ll note that PyCon US is in early April, and the call for papers closes in September, almost seven months before the conference begins. We were solicited for launch sponsorship for PyCon US 2014 almost before the boxes were packed for PyCon US 2013.

I was not happy about this, but I had a commitment to the community to produce a schedule, so I followed through on that commitment. I did so even as the situation became more and more tense, and as I started receiving emails from speakers (who did not have inside knowledge) ranging from surprise to accusing me of being the source of the issue. Because I didn’t want to raise a political stink, I deflected those as best I could, just apologizing and moving on.

Everyone knows the results: There wasn’t enough time to for people to do a top-flight job submitting papers, they couldn’t get their travel together, the review time was compressed. We had to reopen the CFP to get more responses (as we had to last year, when the website was also delayed). The delay between the CFP close and the speaker notification was an unwise attempt on my part to get community participation in the review process; honestly, I should have just put on my dictator hat and picked papers in a day or two, but I was already receiving complaints that the process wasn’t transparent enough.

At the time, I was willing to write it off as one of those things that happens, and I tolerated Steve sniping at me (including one instance where he flat-out accused me of not having done something that I had sent him a week before… in an email to which he had responded. He apologized after I demanded one).

Finally, we had a schedule that I still feel is a good one, and I thought the matter was closed.

Then, one speaker (who I will not name) took exception to having pay to get in. Now, I accept fully that I should have mentioned the fact that speakers have to pay in the acceptance email; that was my oversight, and I apologize for that. However, for this speaker, that was just the last straw; they had seen the problems above, and had decided that Djangocon wasn’t something they could participate in. By itself, this was unfortunate, but not a disaster. I simply thanked the speaker for letting us know, and would have moved on.

Steve, without consulting me, sent his own separate response to the speaker. His response to the speaker was rude, condescending, and supercilious. (He did offer the speaker a comp membership, but that was overshadowed by the tone of the reply.) I was shocked by it. At that point, I had had enough; that speaker had been recruited through my social network, and to have them treated in such a fashion was unbearable. Steve had form in this regard, as noted above.

So, I decided I could no longer support Steve as one would expect a program chair to do, and resigned. I would not have done so had the program not been delivered, but at that point, I felt that I had fulfilled my obligations to the conference. Steve apparently feels that I waited to resign in order to do maximum damage; quite the opposite, maximum damage would have been August 1st, before the program was set. I certainly had reason to do so then.

I have always disagreed with the decision to have speakers pay for membership in Djangocon, as it is currently structured. It is frequently noted that this is the tradition for PyCon US and EU, and Djangocon EU; I’ve spoken at all of those, and been happy to pay for membership. However, Djangocon US, as currently constituted, is a commercial convention put on by Open Bastion (that is to say, Steve Holden) using trademarks licensed by the DSF; it’s not a community conference the way Djangocon EU or PyCon US/EU are. Thus, I don’t think the PyCon-Djangocon analogy is apt. Steve protests that it is intended to be a community conference operated by the community, but protestations are not contracts.

I’ll mention that PGCon, which is about the same size as Djangocon US, provides free memberships for speakers and significantly lower membership rates, so there is an existence proof that such a conference is possible.

I do not encourage or discourage anyone from attending Djangocon US 2014. If you do attend, please enjoy it! I want to again thank Jeff Triplett for stepping up in this difficult situation.

I cannot, however, honestly support any further convention that Steve Holden is involved with. The wounds that Djangocon US 2013 and 2014 suffered are entirely self-inflicted by him, and I would encourage the DSF to find an alternative for 2015.

« Older Entries