postgresql when it's not your job

7 January 2014

14:51

Speaking at FOSDEM 2014

I’ll be speaking at FOSDEM 2014 in Brussels! I’ll be giving two talks:

“The Worst Day of Your Life” (on finding and recovering from PostgreSQL data corruption and bugs) on January 31st at the Raddison Blu hotel.
“Real-Life PostgreSQL JSON” on February 1 in the PostgreSQL devroom at the main FOSDEM conference.

1 December 2013

01:22

There’s a very unpleasant replication issue in version 9.0.14, 9.1.10, 9.2.5, 9.3.0 and 9.3.1 of PostgreSQL. Be sure to read the linked wiki entry carefully; it can result in silent data corruption on secondary servers in replication sets.

no comments

3 July 2013

08:18

JSON vs hstore: Which will get you into a cool bar in the Mission?

After all, isn’t that the most important issue?

Craig Kerstiens has a good overview of what the two types are. I wanted to give a quick and dirty flowchart as to which one I would use in a green-field development situation.

First, hstore performs better than JSON in nearly every situation, but performance is (as I wrote on Twitter) just one color in a rainbow of annoying decision points you have to make when building a system.

So, what would I do?

If I am accepting outside data as JSON and want to store it in its hierarchic form, it would require a very clear performance case for me to want to take the time to convert it to hstore.
If I am generating the data internally (rather than accepting it in its near-complete form), and the hierarchy is not deep, I’d use hstore for its faster performance.
If retaining the types of the values is essential, JSON is the choice, since hstore stores everything as a string.
If fully-arbitrary querying on key/value pairs is essential, hstore’s current indexing situation is far superior to JSON’s.
If you only query on a very small number (1-3) of keys at the top level, it’s probably better to build b-tree indexes over that key’s value than use GiST or GIN indexes; hstore does not have a significant advantage over JSON then.

As JSON support in PostgreSQL matures, this calculus will change, perhaps significantly. But that’s the 9.2 situation, in my view.

8 comments

2 July 2013

06:27

PostgreSQL Advisory Locks in Django Made Easy

Advisory locks are a very useful feature in PostgreSQL, and they just aren’t used enough.

Here’s a scenario: You have a bulk import job. While that job is running, there’s an analysis job you want to prevent from starting, and you don’t want a bulk import to start while the analysis job is running. But any number of bulk importers can run at the same time. How do you communicate this?

With an advisory lock!

Each of the bulk importers can take a shared advisory lock. Those locks don’t block each other, so they can run freely. But the analysis job takes an exclusive advisory lock of the same type. It will wait until all the importers are done, and the importers will not be issued their shared locks until it completes.

Even better, PostgreSQL cleans them up for you when a session terminates; you don’t have to worry about a lock lingering when you didn’t mean it to.

To make using advisory locks easier in Django, I have a small context manager than can be used to wrap code that should run with an advisory lock held; it makes it much easier to use them in your application. You can find it on GitHub, and it installs using pip.

4 comments

04:51

PostgreSQL as a Schemaless Database

My presentation from FOSDEM 2013, PostgreSQL as a Schemaless Database, is now posted (sorry for the delay!).

no comments

23 June 2013

11:32

‘Rollback’ exception added to Xact

I often find that I’m in the middle of a loop or something and discover an error. I want to exit the loop in a way that causes the database work I’ve done within it to be rolled back, but I don’t want that exception to propagate further.

This usually looks like:

try:
   with xact():
      for thing in things:
         (etc. etc.)
 except Rollback:
     pass

Having noticed this pattern a lot, I’ve added it as a feature to Xact. Xact defines a Rollback exception. It processes it like any other exception, but then swallows it and normally exits the function or with clause. If you feel motivated, you can subclass Rollback, although the utility of that escapes me at the moment.

When Django 1.6 is released, Xact will be deprecated in favor of new functionality there… but for now, have fun with it!

1 comment

15 May 2013

23:59

DjangoCon US 2013 CFP is open

The Call for Papers for DjangoCon US 2013 is now open.

no comments

7 April 2013

13:17

psycopg2 2.5 released

psycopg2, the Python PostgreSQL interface library, is now up to version 2.5. This includes built-in support for the JSON and range types… yay!

no comments

13 March 2013

21:14

The Text Type… If That’s Your Real Name.

tl;dr: Don’t give tables the same name as base PostgreSQL types, even though it will let you.

It’s interesting how synchronicity can occur. In my talk about custom PostgreSQL types in Python, I mentioned that any time you create a table in PostgreSQL, you’re also creating a type: the row type of the table.

While I was presenting the talk, a client sent me email wondering why a pg_restore of an expression index was failing, because the return type text was not the same as pg_catalog.text. OK, that’s strange!

What had happened is that the database has a table with the name text, which PostgreSQL will happily let you do:

postgres=# CREATE TABLE text (
postgres(#    i INTEGER
postgres(# );
CREATE TABLE

And both types appear in pg_type:

postgres=# SELECT typname, typnamespace FROM pg_type WHERE typname='text';
 typname | typnamespace 
---------+--------------
 text    |           11
 text    |         2200
(2 rows)

Needless to say, this isn’t a great idea, because although PostgreSQL seems to keep them straight most of the time, there are times (like a pg_restore processing an expression index) that it can get confused.

So, don’t do this.

5 comments

16:57

PostgreSQL, Python and Squid at PyPgDay 2013

The slides for my presentation, “PostgreSQL, Python and Squid” (otherwise known as, “using Python in PostgreSQL and PostgreSQL from Python”) presented at PyPgDay 2013 at PyCon 2013, are available for download.

no comments