<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Build &#187; PostgreSQL</title>
	<atom:link href="http://thebuild.com/blog/category/postgresql/feed/" rel="self" type="application/rss+xml" />
	<link>http://thebuild.com/blog</link>
	<description>programming, etc.</description>
	<lastBuildDate>Fri, 18 May 2012 14:08:49 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Running PostgreSQL on AWS</title>
		<link>http://thebuild.com/blog/2012/05/18/running-postgresql-on-aws/</link>
		<comments>http://thebuild.com/blog/2012/05/18/running-postgresql-on-aws/#comments</comments>
		<pubDate>Fri, 18 May 2012 14:08:49 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=405</guid>
		<description><![CDATA[My presentation from PGCon 2012, PostgreSQL on AWS with Reduced Tears, is now up.
]]></description>
			<content:encoded><![CDATA[<p>My presentation from PGCon 2012, <a href="http://thebuild.com/presentations/pg-aws.pdf">PostgreSQL on AWS with Reduced Tears</a>, is now up.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2012/05/18/running-postgresql-on-aws/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Of Pickups and Tractor-Trailers</title>
		<link>http://thebuild.com/blog/2012/04/25/of-pickups-and-tractor-trailers/</link>
		<comments>http://thebuild.com/blog/2012/04/25/of-pickups-and-tractor-trailers/#comments</comments>
		<pubDate>Thu, 26 Apr 2012 02:30:59 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=400</guid>
		<description><![CDATA[Pickup trucks are great.

No, really. They are great vehicles. You can use them for all sorts of really useful things: Bringing your tools out to a construction gig. Delivering refrigerators. Helping your friend move a sofa. Carting away a reasonable amount of construction debris.

But if you need to deliver 75,000 pounds of steel beams to [...]]]></description>
			<content:encoded><![CDATA[<p>Pickup trucks are great.</p>

<p>No, really. They are great vehicles. You can use them for all sorts of really useful things: Bringing your tools out to a construction gig. Delivering refrigerators. Helping your friend move a sofa. Carting away a reasonable amount of construction debris.</p>

<p>But if you need to deliver 75,000 pounds of steel beams to a construction site, in a single run? A pickup truck will not do it. Not even a big pickup. Not even if you add a new engine. Not even if you are willing to get three pickups. You need equipment designed for that. (And, as a note, the equipment that could handle delivering the steel beams would be a terrible choice for helping a friend move their sofa.)</p>

<p>&#8220;But,&#8221; I hear you say, &#8220;I already know how to drive a pickup! And we have a parking space for it. Can&#8217;t we just use the pickup? You&#8217;re a truck expert; tell us how to get our pickup to pull that load!&#8221;</p>

<p>And I say, &#8220;Being a truck expert, I will tell you again, a pickup the wrong kind of truck. There are other trucks that will handle that load with no trouble, but a pickup isn&#8217;t one of them. The fact that you have a pickup doesn&#8217;t make it the right truck.&#8221;</p>

<p>We have many clients that run PostgreSQL, happily, on Amazon Web Services. </p>

<p>Some clients, however, are not happy. They are attempting to haul tractor-trailer loads (such as high volume data warehouses) using pickup trucks (Amazon EC2 instances). They wish us to fix their problem, but are not willing to move off of Amazon in order to get the problem fixed.</p>

<p>I like AWS for a lot of things; it has many virtues, which <a href="http://www.pgcon.org/2012/schedule/events/419.en.html">I will discuss in detail soon</a>. However, AWS is <em>not</em> the right solution for every problem. In particular, if you require a high read or write data rate in order to get the performance you need from your database, you will ultimately not be happy on AWS. AWS has a single block-device storage mechanism, Elastic Block Storage, which simply does not scale up to very high data rates.</p>

<p>That doesn&#8217;t mean that AWS is useless, it just means it isn&#8217;t the right tool for every job. The problem arises when AWS is considered the fixed point, like the pickup was the fixed point above. At some point, you have to decide:</p>

<ol>
<li>That being on AWS is so important (for whatever reason) that you are willing to sacrifice the performance you want; or,</li>
<li>The performance you want is so important that you will need to move off of AWS.</li>
</ol>

<p>Sadly, even the best of consultants do not have the magic engine in our back room that will cause EBS to perform as well as high-speed direct attached storage.</p>

<p>More soon.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2012/04/25/of-pickups-and-tractor-trailers/feed/</wfw:commentRss>
		<slash:comments>12</slash:comments>
		</item>
		<item>
		<title>Blah, Blah, Blah, First Half of 2012 Edition</title>
		<link>http://thebuild.com/blog/2012/04/18/blah-blah-blah-first-half-of-2012-edition/</link>
		<comments>http://thebuild.com/blog/2012/04/18/blah-blah-blah-first-half-of-2012-edition/#comments</comments>
		<pubDate>Thu, 19 Apr 2012 06:55:57 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=392</guid>
		<description><![CDATA[My speaking schedule, if for some unaccountable reason you want to hear me talk.]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be speaking at the following conferences through July:</p>

<ul>
<li><a href="http://www.pgcon.org/2012/">PGCon</a>, Ottawa, Ontario, Canada, May 17-18.</li>
<li><a href="http://2012.djangocon.eu/">DjangoCon Europe</a>, Zurich, Switzerland, June 4-6.</li>
<li><a href="http://www.southeastlinuxfest.org/">Southwest LinuxFest</a>, Charlotte, North Carolina, June 8-10.</li>
<li><a href="https://ep2012.europython.eu/">EuroPython</a>, Florence, Italy, July 2-8.</li>
<li><a href="http://www.oscon.com/oscon2012">OSCON</a>, Portland, Oregon, USA, July 16-20.</li>
</ul>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2012/04/18/blah-blah-blah-first-half-of-2012-edition/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>The Elements of postgresql.conf Style</title>
		<link>http://thebuild.com/blog/2012/04/13/the-elements-of-postgresql-conf-style/</link>
		<comments>http://thebuild.com/blog/2012/04/13/the-elements-of-postgresql-conf-style/#comments</comments>
		<pubDate>Fri, 13 Apr 2012 15:00:47 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=385</guid>
		<description><![CDATA[&#8230; or, inexcusable things I am tired of seeing in postgresql.conf files.

Do not mix &#8216;n&#8217; match override styles.

There are two valid styles for overriding the default values in postgresql.conf: Putting your changes as a cluster at the end, or uncommenting the defaults and overriding in place. Both have advantages and disadvantages. Having some settings one [...]]]></description>
			<content:encoded><![CDATA[<p>&#8230; or, inexcusable things I am tired of seeing in postgresql.conf files.</p>

<h3>Do not mix &#8216;n&#8217; match override styles.</h3>

<p>There are two valid styles for overriding the default values in postgresql.conf: Putting your changes as a cluster at the end, or uncommenting the defaults and overriding in place. Both have advantages and disadvantages. Having some settings one way and some another is pure disadvantage. Do not do this.</p>

<h3>Use units.</h3>

<p>Quick, what is <code>log_min_duration_statement</code> set to here?</p>

<pre><code>log_min_statement_duration = 2000
</code></pre>

<p>Now, what is it set to here?</p>

<pre><code>log_min_statement_duration = 2s
</code></pre>

<p>Always use units with numeric values if a unit is available.</p>

<h3>Do not remove the default settings.</h3>

<p>If you strip out all of the defaults, it becomes impossible to tell what a particular value is set to. Leave the defaults in place, and if you comment out a setting, reset the value to the default (or at least include comments that make it clear what is going on).</p>

<h3>Do not leave junk postgresql.conf files scattered around.</h3>

<p>If you need to move postgresql.conf (and the other configuration files) to a different location from where the package for your system puts it, don&#8217;t leave the old, dead postgresql.conf lying around. Delete any trace of the old installation hierarchy.</p>

<hr />

<p>Thank you.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2012/04/13/the-elements-of-postgresql-conf-style/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
		<item>
		<title>Instagram&#8217;s Technology Stack</title>
		<link>http://thebuild.com/blog/2012/04/12/instagrams-technology-stack/</link>
		<comments>http://thebuild.com/blog/2012/04/12/instagrams-technology-stack/#comments</comments>
		<pubDate>Thu, 12 Apr 2012 19:09:40 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=381</guid>
		<description><![CDATA[Instagram has been in the news lately. In this really great post on Tumblr, Instagram talks about its technology stack.

I have some acquaintance with the Instagram people, and they are among the smartest technologists I&#8217;ve met. Really nice, too. (Of course, they mention this blog in the post, so I&#8217;m biased.)
]]></description>
			<content:encoded><![CDATA[<p><a href="http://instagram.com/">Instagram</a> has been <a href="http://dealbook.nytimes.com/2012/04/12/the-instragram-deal-a-mark-zuckerberg-production/">in the news lately</a>. In this <a href="http://instagram-engineering.tumblr.com/post/13649370142/what-powers-instagram-hundreds-of-instances-dozens-of">really great post on Tumblr</a>, Instagram talks about its technology stack.</p>

<p>I have some acquaintance with the Instagram people, and they are among the smartest technologists I&#8217;ve met. Really nice, too. (Of course, they mention this blog in the post, so I&#8217;m biased.)</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2012/04/12/instagrams-technology-stack/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>A Recipe for Django Transactions on PostgreSQL</title>
		<link>http://thebuild.com/blog/2012/03/19/a-recipe-for-django-transactions-on-postgresql/</link>
		<comments>http://thebuild.com/blog/2012/03/19/a-recipe-for-django-transactions-on-postgresql/#comments</comments>
		<pubDate>Mon, 19 Mar 2012 19:33:14 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=328</guid>
		<description><![CDATA[Here's a recipe for handling transactions sensibly in Django on PostgreSQL.]]></description>
			<content:encoded><![CDATA[<p>As <a href="http://thebuild.com/blog/2009/11/07/django-postgresql-and-transaction-management/">noted before</a>, Django has a lot of facilities for handling transactions, and it&#8217;s not at all clear how to use them.  In an attempt to cut through the confusion, here&#8217;s a recipe for handling transactions sensibly in Django applications on PostgreSQL.</p>

<p>The goals are:</p>

<ul>
<li>Database operations that do not modify the database aren&#8217;t wrapped in a transactions at all.</li>
<li>Database operations that modify the database are wrapped in a transactions.</li>
<li>We have a lot of fine-grained control over sections that modify the databse vs those that don&#8217;t.</li>
</ul>

<p>The bits of the recipe are:</p>

<ul>
<li>Use the <a href="https://docs.djangoproject.com/en/dev/ref/databases/#autocommit-mode">autocommit option</a> in your database configuration.</li>
<li><em>Do not</em> use the <a href="https://docs.djangoproject.com/en/dev/topics/db/transactions/#tying-transactions-to-http-requests">transaction middleware</a>.</li>
<li>Wrap the sections of code which modify the database in the <code>xact()</code> decorator / context manager below, using it like you would the <a href="https://docs.djangoproject.com/en/dev/topics/db/transactions/#controlling-transaction-management-in-views"><code>commit_on_success()</code></a> decorator.</li>
<li>Profit!</li>
</ul>

<p>The quick reasons behind each step:</p>

<ul>
<li>Turning on autocommit prevents <a href="http://initd.org/psycopg/">psycopg2</a> from automatically starting a new transaction on the first database operation on each connection; this means that the transaction only starts when we want it to.</li>
<li>Similarly, the transaction middleware will set the connection state to &#8220;managed,&#8221; which will defeat the autocommit option above, so we leave it out.</li>
<li>The <code>xact()</code> decorator will set up the connection so that a transaction <em>is</em> started in the relevant block, which is what we want for database-modifying operations.</li>
</ul>

<p>This recipe a few other nice features:</p>

<ul>
<li><code>xact()</code> operates like <code>commit_on_success()</code>, in that it will issue a rollback if an exception escapes from the block or function it is wrapping.</li>
<li><code>xact()</code> ignores the dirty flag on the Django connection. Since we&#8217;re deliberately wrapping stuff that modifies the database with it, the chance of it being dirty is near 100%, and a commit on a transaction that did not modify the database is no more expensive in PostgreSQL than a rollback. It also means you can do <a href="https://docs.djangoproject.com/en/dev/topics/db/sql/">raw SQL</a> inside an <code>xact()</code> block without the <a href="http://archives.postgresql.org/pgsql-hackers/2008-06/msg01101.php">foot-gun</a> of forgetting to call <code>set_dirty</code>.</li>
<li>Like the built-in Django transaction decorators, it can be used either as a decorator or as a context manager with the <code>with</code> statement.</li>
<li><code>xact()</code> can be nested, giving us nested transactions! If it sees that there is already a transaction open when it starts a new block, it will use a <a href="http://www.postgresql.org/docs/9.1/static/sql-savepoint.html">savepoint</a> to set up a nested transaction block.  (PostgreSQL does not have nested transactions as such, but you can use savepoints to get 99.9% of the way there.)</li>
<li>By not wrapping operations that do not modify the database, we get better behavior when using <a href="http://www.pgpool.net/">pgPool II</a> (more on that in a future post).</li>
<li><code>xact()</code> works around an <a href="https://code.djangoproject.com/ticket/16047">outstanding bug</a> in Django&#8217;s transaction handling on psycopg2.</li>
</ul>

<p><code>xact()</code> also supports the <code>using</code> parameter for <a href="https://docs.djangoproject.com/en/dev/topics/db/multi-db/">multiple databases</a>.</p>

<p>Of course, a few caveats:</p>

<ul>
<li><code>xact()</code> requires the <code>postgresql_psycopg2</code> backend, and PostgreSQL 8.2 or higher. It&#8217;s possible it can be hacked to work on other backends that support savepoints.</li>
<li><code>xact()</code> works just the way you want if it is nested <em>inside</em> a <code>commit_on_success()</code> block (it will properly create a savepoint insted of a new transaction). However, a <code>commit_on_success()</code> block nested inside of <code>xact()</code> will commit or rollback the entire transaction, somewhat defeating the outer <code>xact()</code>. To the extent possible, use only <code>xact()</code> in code you write.</li>
<li>Be sure you catch exceptions <em>outside of</em> the <code>xact()</code> block; otherwise, the automatic rollback will be defeated. Allow the exception to escape the <code>xact()</code> block, and then catch it. (Of course, if the intention is to always commit and to defeat the rollback, by all means catch the exception inside the block.)</li>
</ul>

<p>To use, just drop the source (one class definition, one function) into a file somewhere in your Django project (such as the omni-present <code>utils</code> application every Django project seems to have), and include it. </p>

<p>Examples:</p>

<pre><code>from utils.transaction import xact

@xact
def my_view_function1(request):
   # Everything here will be in a transaction.
   # It'll roll back if an exception escapes, commits otherwise.

def my_view_function2(request):
   # This stuff won't be in a transaction, so don't modify the database here.
   with xact():
      # This stuff will be, and will commit on normal completion, roll back on a exception

def my_view_function3(request):
   with xact():
      # Modify the database here (let's call it "part 1").
      try:
         with xact():
            # Let's call this "part 2."
            # This stuff will be in its own savepoint, and can commit or
            # roll back without losing the whole transaction.
      except:
         # Part 2 will be rolled back, but part 1 will still be available to
         # be committed or rolled back.  Of course, if an exception
         # inside the "part 2" block is not caught, both part 2 and
         # part 1 will be rolled back.
</code></pre>

<p>The source is <a href="https://github.com/Xof/xact/">available on GitHub</a>. It&#8217;s licensed under the <a href="http://www.postgresql.org/about/licence/">PostgreSQL License</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2012/03/19/a-recipe-for-django-transactions-on-postgresql/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>PostgreSQL Performance When It&#8217;s Not Your Job</title>
		<link>http://thebuild.com/blog/2012/01/24/postgresql-performance-when-its-not-your-job/</link>
		<comments>http://thebuild.com/blog/2012/01/24/postgresql-performance-when-its-not-your-job/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 06:03:09 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=326</guid>
		<description><![CDATA[My presentation from SCALE 10x, &#8220;PostgreSQL Performance When It&#8217;s Not Your Job&#8221; is now available for download.
]]></description>
			<content:encoded><![CDATA[<p>My presentation from SCALE 10x, <a href="http://thebuild.com/presentations/not-my-job.pdf">&#8220;PostgreSQL Performance When It&#8217;s Not Your Job&#8221;</a> is now available for download.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2012/01/24/postgresql-performance-when-its-not-your-job/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>&#8220;Sharding &amp; IDs at Instagram&#8221;</title>
		<link>http://thebuild.com/blog/2011/09/30/sharding-ids-at-instagram/</link>
		<comments>http://thebuild.com/blog/2011/09/30/sharding-ids-at-instagram/#comments</comments>
		<pubDate>Sat, 01 Oct 2011 05:39:21 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=324</guid>
		<description><![CDATA[I&#8217;d like to recommend an interesting post, &#8220;Sharding &#38; IDs at Instagram&#8221;, about sharding using Postgres.
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;d like to recommend an interesting post, <a href="http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram">&#8220;Sharding &amp; IDs at Instagram&#8221;</a>, about sharding using Postgres.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/09/30/sharding-ids-at-instagram/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cleaning up after your Bucardo goats</title>
		<link>http://thebuild.com/blog/2011/09/27/cleaning-up-after-your-bucardo-goats/</link>
		<comments>http://thebuild.com/blog/2011/09/27/cleaning-up-after-your-bucardo-goats/#comments</comments>
		<pubDate>Wed, 28 Sep 2011 05:25:20 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=313</guid>
		<description><![CDATA[A pure PL/pgSQL script for daily Bucardo maintenance.]]></description>
			<content:encoded><![CDATA[<p>If you are not familiar with it already, <a href="http://bucardo.org/wiki/Bucardo">Bucardo</a> is a nifty multi-master replication system for PostgreSQL, written by <a href="http://www.endpoint.com/team/greg_sabino_mullane">Greg Sabino Mullane</a>. Written in Perl, it is great if you need replication that doesn&#8217;t have the restrictions associated with PG 9&#8217;s <a href="http://www.postgresql.org/docs/9.1/static/warm-standby.html#STREAMING-REPLICATION">streaming replication</a>.</p>

<p>To keep your Bucardo installation clean and tidy, a few <a href="http://bucardo.org/wiki/Bucardo/Cron">regular cron jobs</a> are required. One of them cleans up the archived replicated data (stored in a separate database by Bucardo) once you know you are done with it. </p>

<p>The Bucardo page above has a recommended script using all sorts of <code>bash</code>ing, but I wanted something a bit more pure-PostgreSQL; it also doesn&#8217;t purge more than one old table at a time. So, I whipped up the following PL/pgSQL function.</p>

<p>(Note that this is for Bucardo 4.4. I haven&#8217;t played with the forthcoming Bucardo 5, so I&#8217;m not sure if this is still required.)</p>

<pre><code>CREATE OR REPLACE FUNCTION bucardo.purge_frozen_child_qs(far_back interval)
    RETURNS SETOF TEXT AS
$purge_frozen_child_qs$
DECLARE
    t TEXT;
    qt TEXT;
BEGIN

    IF far_back IS NULL THEN
        RAISE EXCEPTION 'Interval cannot be null.'
            USING HINT = 'So, do not do that.';
    END IF;

    IF (now() + far_back) &gt; now() THEN
        RAISE EXCEPTION 'Interval must be negative.'
            USING HINT = 'Consider using the "ago" form of intervals.';
    END IF;

    FOR t IN 
        SELECT tablename 
            FROM pg_tables
            WHERE schemaname='freezer' 
                  AND tablename like 'child_q_%' 
                  AND (replace(tablename, 'child_q_', '')::timestamp with time zone) &lt; now() + far_back::interval
            ORDER BY tablename
    LOOP
        qt := 'freezer.' || t;
        EXECUTE 'DROP TABLE ' || qt;
        RETURN NEXT qt;
    END LOOP;

    DELETE FROM bucardo.q 
        WHERE (started &lt; now() + far_back::interval 
                OR ended &lt; now() + far_back::interval 
                OR aborted &lt; now() + far_back::interval 
                OR cdate &lt; now() + far_back::interval) 
              AND (ended IS NULL OR aborted IS NULL);

    RETURN;

END
$purge_frozen_child_qs$
LANGUAGE plpgsql
    VOLATILE;
</code></pre>

<p>To use it, just call it repeatedly from a cron job with the appropriate argument, along the lines of:</p>

<pre><code>SELECT * FROM bucardo.purge_frozen_child_qs('7 days ago'::interval);
</code></pre>

<p>It returns the names of the tables it deleted.</p>

<p>This particular function doesn&#8217;t need to be run more often than once a day. And it keeps your Bucardo goats nice and clean.</p>

<p>(A &#8220;bucardo&#8221; is a <a href="http://en.wikipedia.org/wiki/Pyrenean_Ibex">now-extinct species of goat</a>. For why Bucardo is goat-related, ask Greg.)</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/09/27/cleaning-up-after-your-bucardo-goats/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Unbreaking Your Django Application</title>
		<link>http://thebuild.com/blog/2011/07/26/unbreaking-your-django-application/</link>
		<comments>http://thebuild.com/blog/2011/07/26/unbreaking-your-django-application/#comments</comments>
		<pubDate>Tue, 26 Jul 2011 21:14:59 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=310</guid>
		<description><![CDATA[My tutorial at OSCON 2011, Unbreaking Your Django Application, is now available for download.
]]></description>
			<content:encoded><![CDATA[<p>My tutorial at OSCON 2011, <a href="http://thebuild.com/presentations/unbreaking-django.pdf">Unbreaking Your Django Application</a>, is now available for download.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/07/26/unbreaking-your-django-application/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Life with Object-Relational Mappers</title>
		<link>http://thebuild.com/blog/2011/05/18/life-with-object-relational-mappers/</link>
		<comments>http://thebuild.com/blog/2011/05/18/life-with-object-relational-mappers/#comments</comments>
		<pubDate>Wed, 18 May 2011 18:31:21 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=307</guid>
		<description><![CDATA[The slides from my talk at PGCon 2011 are now available.
]]></description>
			<content:encoded><![CDATA[<p>The slides from my talk at PGCon 2011 are <a href="http://blog.thebuild.com/presentations/drstrangedata.pdf">now available</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/05/18/life-with-object-relational-mappers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Transaction-Level Advisory Locks in PostgreSQL 9.1</title>
		<link>http://thebuild.com/blog/2011/03/13/transaction-level-advisory-locks-in-postgresql-9-1/</link>
		<comments>http://thebuild.com/blog/2011/03/13/transaction-level-advisory-locks-in-postgresql-9-1/#comments</comments>
		<pubDate>Sun, 13 Mar 2011 22:42:50 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=292</guid>
		<description><![CDATA[Advisory locks are one of the cool unsung features of PostgreSQL. In 9.1, they are getting even cooler with transaction level locks. Many details here.
]]></description>
			<content:encoded><![CDATA[<p>Advisory locks are one of the cool unsung features of PostgreSQL. In 9.1, they are getting even cooler with transaction level locks. <a href="http://www.depesz.com/index.php/2011/03/14/waiting-for-9-1-transaction-level-advisory-locks/">Many details here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/03/13/transaction-level-advisory-locks-in-postgresql-9-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Django and PostgreSQL at PostgreSQL Conference East</title>
		<link>http://thebuild.com/blog/2011/03/09/django-and-postgresql-at-postgresql-conference-east/</link>
		<comments>http://thebuild.com/blog/2011/03/09/django-and-postgresql-at-postgresql-conference-east/#comments</comments>
		<pubDate>Thu, 10 Mar 2011 06:03:51 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=287</guid>
		<description><![CDATA[I&#8217;ll be giving a full day tutorial about developing Django applications using PostgreSQL. If you are just getting started with Django, this is a great introduction; it is intended for developers who are just getting into serious Django/PG development.

It&#8217;ll cover general development in Django, with a lot of PostgreSQL-specific details.

And, of course, the whole conference [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be giving a <a href="https://www.postgresqlconference.org/files/east_2011_schedule.html">full day tutorial</a> about developing Django applications using PostgreSQL. If you are just getting started with Django, this is a great introduction; it is intended for developers who are just getting into serious Django/PG development.</p>

<p>It&#8217;ll cover general development in Django, with a lot of PostgreSQL-specific details.</p>

<p>And, of course, the whole conference will be a fount of great PostgreSQL geekery.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/03/09/django-and-postgresql-at-postgresql-conference-east/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Anatomy of a Crushing&#8221;</title>
		<link>http://thebuild.com/blog/2011/03/08/anatomy-of-a-crushing/</link>
		<comments>http://thebuild.com/blog/2011/03/08/anatomy-of-a-crushing/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 01:52:51 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=284</guid>
		<description><![CDATA[A fun and interesting article about a sudden burst in traffic at Pinboard when Yahoo! announced they were shutting down Delicious. Relevant to app and DB designers everywhere.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://pinboard.in/blog/173/">A fun and interesting</a> article about a sudden burst in traffic at Pinboard when Yahoo! announced they were shutting down Delicious. Relevant to app and DB designers everywhere.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/03/08/anatomy-of-a-crushing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;10 Ways to Kill Performance&#8221;</title>
		<link>http://thebuild.com/blog/2011/02/25/10-ways-to-kill-performanc/</link>
		<comments>http://thebuild.com/blog/2011/02/25/10-ways-to-kill-performanc/#comments</comments>
		<pubDate>Sat, 26 Feb 2011 00:17:55 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=282</guid>
		<description><![CDATA[The slides from my talk, &#8220;10 Easy Ways to Destroy Performance&#8221; from PgDay at SCALE 9X are available.
]]></description>
			<content:encoded><![CDATA[<p>The slides from my talk, <a href="http://thebuild.com/presentations/10-ways-to-kill-performance.pdf">&#8220;10 Easy Ways to Destroy Performance&#8221;</a> from PgDay at SCALE 9X are available.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/02/25/10-ways-to-kill-performanc/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>&#8220;10 Easy Ways to Destroy Performance&#8221; at pgDay at SCALE-9X</title>
		<link>http://thebuild.com/blog/2011/02/15/10-easy-ways-to-destroy-performance-scale-9x/</link>
		<comments>http://thebuild.com/blog/2011/02/15/10-easy-ways-to-destroy-performance-scale-9x/#comments</comments>
		<pubDate>Wed, 16 Feb 2011 05:22:19 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=276</guid>
		<description><![CDATA[I&#8217;ll be presenting a talk on &#8220;10 Easy Ways to Destroy Performance&#8221; at pgDay at SCALE-9X, on February 25th in Los Angeles.
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be presenting a talk on &#8220;10 Easy Ways to Destroy Performance&#8221; at <a href="https://sites.google.com/site/pgdayla/">pgDay at SCALE-9X</a>, on February 25th in Los Angeles.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/02/15/10-easy-ways-to-destroy-performance-scale-9x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Django Development with PostgreSQL&#8221; at PostgreSQL Conference East</title>
		<link>http://thebuild.com/blog/2011/02/15/django-postgresql-postgresql-conference-east/</link>
		<comments>http://thebuild.com/blog/2011/02/15/django-postgresql-postgresql-conference-east/#comments</comments>
		<pubDate>Wed, 16 Feb 2011 05:19:59 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=274</guid>
		<description><![CDATA[I&#8217;ll be presenting a full-day tutorial on Django Development with PostgreSQL at PostgreSQL Conference East, March 22-25 in New York!
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be presenting a full-day tutorial on <a href="https://www.postgresqlconference.org/content/django-development-postgresql">Django Development with PostgreSQL</a> at <a href="https://www.postgresqlconference.org/">PostgreSQL Conference East</a>, March 22-25 in New York!</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/02/15/django-postgresql-postgresql-conference-east/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL for Servoy Developers</title>
		<link>http://thebuild.com/blog/2011/02/04/postgresql-for-servoy-developers/</link>
		<comments>http://thebuild.com/blog/2011/02/04/postgresql-for-servoy-developers/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 08:59:49 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=269</guid>
		<description><![CDATA[The slides from my presentation on PostgreSQL for Servoy Developers, presented at ServoyWorld 2011, are available here.
]]></description>
			<content:encoded><![CDATA[<p>The slides from my presentation on PostgreSQL for Servoy Developers, presented at <a href="http://servoy.com">ServoyWorld 2011</a>, are <a href="http://thebuild.com/presentations/pg-servoy.pdf">available here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/02/04/postgresql-for-servoy-developers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Extra columns when doing .distinct() in a Django QuerySet</title>
		<link>http://thebuild.com/blog/2010/12/22/extra-columns-when-doing-distinct-in-a-django-queryset/</link>
		<comments>http://thebuild.com/blog/2010/12/22/extra-columns-when-doing-distinct-in-a-django-queryset/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 22:54:05 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=239</guid>
		<description><![CDATA[If you are doing a <code>.distinct()</code> query and limiting the results using <code>.values()</code> or <code>.values_list()</code>, you may be in for a surprise if your model has a default ordering using the Meta value <code>ordering</code>. You probably want to clear the ordering using <code>.order_by()</code> with no parameters.]]></description>
			<content:encoded><![CDATA[<p><strong>tl;dr</strong>: If you are doing a <code>.distinct()</code> query and limiting the results using <code>.values()</code> or <code>.values_list()</code>, you may be in for a surprise if your model has a default ordering using the Meta value <code>ordering</code>. You probably want to clear the ordering using <code>.order_by()</code> with no parameters.</p>

<p><span id="more-239"></span></p>

<hr />

<p>If a model is ordered, either by <code>.order_by()</code> on the QuerySet or a Meta <code>ordering</code> value, it will always include that field in the QuerySet. This is true <em>even if the query uses <code>.distinct()</code></em>. To quote <a href="http://docs.djangoproject.com/en/1.2/ref/models/querysets/#distinct">the documentation</a>:</p>

<blockquote>
  <p>Any fields used in an<code>order_by()</code> call are included in the SQL SELECT columns. This can sometimes lead to unexpected results when used in conjunction with <code>distinct()</code>.</p>
</blockquote>

<p>(The documentation as written implies that is only problem with related models, but as we&#8217;ll see, it&#8217;s a problem in general. <a href="http://code.djangoproject.com/ticket/14942">A documentation patch</a> is probably in order here.)</p>

<p>By way of illustration, let&#8217;s assume you have the following models:</p>

<pre><code>from django.db import models

class Publisher(models.Model):
    name = models.TextField()

    class Meta:
        ordering = [ 'name', ]

class Book(models.Model):
    title = models.TextField()
    topic = models.TextField()
    publisher = models.ForeignKey(Publisher)

    class Meta:
        ordering = [ 'title', ]
</code></pre>

<p>And we create some rows:</p>

<pre><code>pub = Publisher(name="Strange But True Publications")
pub.save()
</code></pre>

<p>And some books:</p>

<pre><code>book1 = Book(title="New Topics in Industrial Meringue Production",
             topic="Cooking",
             publisher=pub)
book1.save()

book2 = Book(title="Your Chicken's First Song Book",
             topic="Animal Husbandry",
             publisher=pub)
book2.save()
</code></pre>

<p>Now, we want to get the list of IDs of the publishers, and we&#8217;re using the <a href="http://thebuild.com/blog/2010/12/22/getting-the-id-of-related-objects-in-django/">cool optimization that I described earlier</a>, with the optimization a commenter suggested (thanks!):</p>

<pre><code>&gt;&gt;&gt; q = Book.objects.values_list('publisher_id', flat=True).distinct()
&gt;&gt;&gt; print q
[1, 1]
</code></pre>

<p>Um, wait.  That&#8217;s not right.  Why would it return <code>1</code> twice when we said <code>.distinct()</code>?  Let&#8217;s look at the SQL (you <em>are</em> doing a <code>tail -f</code> on the PostgreSQL logs while  you develop, right?):</p>

<pre><code>LOG:  statement: SELECT DISTINCT "x_book"."publisher_id", "x_book"."title" FROM "x_book" ORDER BY "x_book"."title" ASC LIMIT 21
</code></pre>

<p>And there we have it.  It includes the <code>title</code> field in the query, even though it doesn&#8217;t return it.  Since the <code>DISTINCT</code> thus applies to both, we have two distinct rows, rather than one.</p>

<p>The fix, fortunately, is easy; just clear the ordering with a <code>.order_by()</code> without any parameters:</p>

<pre><code>&gt;&gt;&gt; q = Book.objects.values_list('publisher_id', flat=True).distinct().order_by()
&gt;&gt;&gt; print q
[1]
</code></pre>

<p>And the query:</p>

<pre><code>LOG:  statement: SELECT DISTINCT "x_book"."publisher_id" FROM "x_book" LIMIT 21
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/22/extra-columns-when-doing-distinct-in-a-django-queryset/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Getting the ID of Related Objects in Django</title>
		<link>http://thebuild.com/blog/2010/12/22/getting-the-id-of-related-objects-in-django/</link>
		<comments>http://thebuild.com/blog/2010/12/22/getting-the-id-of-related-objects-in-django/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 07:00:03 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=232</guid>
		<description><![CDATA[Don't retrieve a whole row just to get the primary key you had anyway. Don't iterate in the app; let the database server do the iteration for you. ]]></description>
			<content:encoded><![CDATA[<p><strong>tl;dr</strong>: Don&#8217;t retrieve a whole row just to get the primary key you had anyway. Don&#8217;t iterate in the app; let the database server do the iteration for you. </p>

<p><span id="more-232"></span></p>

<hr />

<p>There&#8217;s a couple of bad habits I see a lot in Django code (including, sadly, my own), which is abuse of a ForeignKey field. Let&#8217;s take the classic example:</p>

<pre><code>class Publisher(Model):
    # We accept the default 'id' column
    name = TextField()
    ...

class Book(Model):
    # Likewise
    title = TextField()
    topic = TextField()
    publisher = ForeignKey(Publisher)
        # Remember this creates a publisher_id column
</code></pre>

<p>Now, let&#8217;s say we have a book:</p>

<pre><code>b = Book.objects.get(title="Interior Landscapes")
</code></pre>

<p>And we want the ID of the publisher.</p>

<p><strong>Don&#8217;t do this:</strong></p>

<pre><code>pub_id = b.publisher.id
</code></pre>

<p>This works, but it&#8217;s absurd: It does a separate select to fetch the entire Publisher object, and then extracts the ID.  But, of course, <em>it already had the ID</em>, because that&#8217;s how it retrieved the publisher object.  Instead, just go straight to the created ID field:</p>

<pre><code>pub_id = b.publisher_id
</code></pre>

<p>Next, don&#8217;t use iteration to build lists if you can get the data directly out of the database. For example, suppose we want the list of publishers who publish books with topic &#8220;Surreal Architecture&#8221;. Far too often, I see this:</p>

<pre><code>surreal_books = Books.objects.filter(topic="Surreal Architecture")

surreal_publishers = set([book.publisher.id for book in surreal_books])
</code></pre>

<p>In this case, Django will send one query to get the list of books, and then do a separate query for each publisher to get the publisher id&#8230; even though they&#8217;re already in memory.</p>

<pre><code>surreal_publishers = set([book.publisher_id for book in surreal_books])
</code></pre>

<p>This is better, since it doesn&#8217;t have to retrieve each publisher, but far better is to make the database do all the work:</p>

<pre><code>surreal_publishers_qs = Books.objects.filter(topic="Surreal Architecture").values('publisher_id').distinct()
</code></pre>

<p>The result set, in this case, is a bit of an odd duck: It&#8217;s a list of dictionaries, each dict being of the form <code>{ 'publisher_id': &lt;id value&gt; }</code>.  Of course, Python being Python, it&#8217;s not hard to transform that into a set:</p>

<pre><code>surreal_publishers = set([entry['publisher_id'] for entry in surreal_publishers_qs])
</code></pre>

<p>And we didn&#8217;t have to do any raw SQL!</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/22/getting-the-id-of-related-objects-in-django/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Comparing NULLs Considered Silly</title>
		<link>http://thebuild.com/blog/2010/12/17/comparing-nulls-considered-silly/</link>
		<comments>http://thebuild.com/blog/2010/12/17/comparing-nulls-considered-silly/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 16:48:19 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=226</guid>
		<description><![CDATA[You can't compare NULLs. A nullable primary key is a contradiction in terms. You can't join on NULL, so a NULL foreign key refers to nothing, by definition. NULL doesn't do what you think it does, no matter what you think it does.]]></description>
			<content:encoded><![CDATA[<p><strong>tl;dr</strong>: You can&#8217;t compare NULLs. A nullable primary key is a contradiction in terms. You can&#8217;t join on NULL, so a NULL foreign key refers to nothing, by definition. NULL doesn&#8217;t do what you think it does, no matter what you think it does.</p>

<p><span id="more-226"></span></p>

<hr />

<p>NULL in SQL is annoyingly complex.</p>

<p>There&#8217;s really no conceptual model of NULL that will not end up surprisingly you in unpleasant ways. Jeff Davis, last year, wrote <a href="http://thoughts.j-davis.com/2009/08/02/what-is-the-deal-with-nulls/">a great blog post</a> that, if could be so bold, could be paraphrased as &#8220;conceptual models of NULL considered harmful.&#8221;</p>

<p>Thus, it&#8217;s not surprising that some&#8230; well, surprising ideas about NULL sometimes pop up.</p>

<p>Recently, on the Django developers&#8217; list, the phrase &#8220;nullable primary key&#8221; caught my eye. This inspired me to write these thoughts about NULL, and in particular NULL being used as keys.</p>

<p>First:</p>

<blockquote>
  <p>It ie meaningless to compare two NULL values.</p>
</blockquote>

<p>I&#8217;ve noticed application programmers often treat NULL as a magic value that any type can possess (I&#8217;ve been quite guilty of this, too). While this is somewhat true, it&#8217;s also a dangerous path to go down, because:</p>

<pre><code>NULL = NULL
</code></pre>

<p>&#8230; is NULL, not true. Whatever else you can say about NULL, a NULL value means you can make no claims about what value it is. Saying, &#8220;I have no idea what this value is, and I have no idea about what that value is, but are they equal?&#8221; is, I would hope, pretty self-evidently meaningless.</p>

<p>Now, this immediately implies:</p>

<blockquote>
  <p>You cannot join on NULL.</p>
</blockquote>

<p>If a foreign key column is NULL, you can&#8217;t do an inner join on it to another table, <em>even if key column(s) being referred to is NULL</em>. This follows directly from the fact that you can&#8217;t compare NULL values; joining is just built around comparison, after all.</p>

<p>Yes, you can do things like:</p>

<pre><code>SELECT a.*
    FROM a
    INNER JOIN b
        ON (b.col = a.col) OR ( (b.col IS NULL) AND (a.col IS NULL) )
</code></pre>

<p>Setting aside that you&#8217;ve pretty much committed yourself to a nested loop at this point (and thus a very expensive operation), the fact that you have to jump through this hoop should be an indication that the wrong path is being trod.</p>

<p>So, please remember: NULL in a foreign key field does not mean &#8220;This refers to rows in the other table that have a matching NULL,&#8221; because there&#8217;s no such thing as a &#8220;matching NULL.&#8221;</p>

<p>Moving on to primary keys:</p>

<blockquote>
  <p>A primary key is a combination of columns whose values, taken together, uniquely specify a row.</p>
</blockquote>

<p>Thus, a nullable primary key is equally meaningless, as the whole point of a primary key is for it it to be compared to other values to determine uniqueness. (The SQL standard prohibits NULLs in primary key columns, so it&#8217;s not just a good idea, it&#8217;s the law, or at least the recommendation.)</p>

<hr />

<p>The SQL standard calls for NULL to be thrown up in places where it really should require an error.  For example:</p>

<pre><code>SELECT SUM(col) FROM t WHERE FALSE
</code></pre>

<p>&#8230; returns NULL, as the result of any aggregate function over zero rows is NULL. But the sum of no numbers is 0, not &#8220;unspecified&#8221; (or whatever you want to call NULL).</p>

<p>Worse:</p>

<pre><code>SELECT AVG(col) FROM t WHERE FALSE
</code></pre>

<p>is NULL, while:</p>

<pre><code>SELECT 0/0
</code></pre>

<p>&#8230; much more rationally gives a divsion-by-zero error.</p>

<p>My guess is that the SQL standards committee is loathe to have the spec require errors for more-or-less common operations, and that&#8217;s where a lot of the stranger cases of NULL come from, as a way of having a normal-but-flagged return from an edge case.</p>

<p>It&#8217;s really a shame that NULL is so complex and counterintuitive, but there&#8217;s really no hope for it except to learn the rules, and not try to abuse NULL to do things it wasn&#8217;t designed for.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/17/comparing-nulls-considered-silly/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Using Server-Side PostgreSQL Cursors in Django</title>
		<link>http://thebuild.com/blog/2010/12/14/using-server-side-postgresql-cursors-in-django/</link>
		<comments>http://thebuild.com/blog/2010/12/14/using-server-side-postgresql-cursors-in-django/#comments</comments>
		<pubDate>Wed, 15 Dec 2010 00:02:22 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=221</guid>
		<description><![CDATA[A couple more interesting hacks for dealing with very large result sets in Django and PostgreSQL.]]></description>
			<content:encoded><![CDATA[<p>This is a follow-up to the <a href="http://thebuild.com/blog/2010/12/13/very-large-result-sets-in-django-using-postgresql/">previous post</a>, in which we talked about ways of handling huge result sets in Django.</p>

<p>Two commenters (thanks!) pointed out that psycopg2 has built-in support for server-side cursors, using the <code>name</code> option on the <a href="http://initd.org/psycopg/docs/connection.html#connection.cursor">.cursor() function</a>.</p>

<p>To use this in Django requires a couple of small gyrations.</p>

<p>First, Django wraps the actual database connection inside of the <code>django.db.connection</code> object, as property <code>connection</code>. So, to create a named cursor, you need:</p>

<pre><code>cursor = django.db.connection.connection.cursor(name='gigantic_cursor')
</code></pre>

<p>If this is the first call you are making against that connection wrapper object, it&#8217;ll fail; the underlying database connection is created lazily. As a rather hacky solution, you can do this:</p>

<pre><code>from django.db import connection

if connection.connection is None:
    cursor = connection.cursor()
       # This is required to populate the connection object properly

cursor = connection.connection.cursor(name='gigantic_cursor')
</code></pre>

<p>You can then iterate over the results using the standard iterator or <code>cursor.fetchmany()</code> method, and that will grab results in from the server in the appropriate chunks.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/14/using-server-side-postgresql-cursors-in-django/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Very Large Result Sets in Django using PostgreSQL</title>
		<link>http://thebuild.com/blog/2010/12/13/very-large-result-sets-in-django-using-postgresql/</link>
		<comments>http://thebuild.com/blog/2010/12/13/very-large-result-sets-in-django-using-postgresql/#comments</comments>
		<pubDate>Tue, 14 Dec 2010 03:10:17 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=207</guid>
		<description><![CDATA[Don't use Django to manage queries that have very large result sets. If you must, be sure you understand how to keep memory usage manageable.]]></description>
			<content:encoded><![CDATA[<p><strong>tl;dr:</strong> Don&#8217;t use Django to manage queries that have very large result sets. If you must, be sure you understand how to keep memory usage manageable.</p>

<p><span id="more-207"></span></p>

<hr />

<p>One of the great things about modern interpreted, garbage-collected languages is that most of the memory management happens behind the scenes for you.  Unfortunately, sometimes, the stage equipment comes crashing through the backdrop in the middle of the performance.</p>

<p>In Django, this frequently happens when manipulating tables that contain a very large number of rows.  Here are some tips on how to not end up with the &#8220;behind the scenes&#8221; machinery landing in the audience&#8217;s lap.</p>

<p>For purposes of this discussion, let&#8217;s define &#8220;very large&#8221; as being bigger than is comfortable to keep in memory for the appplication.</p>

<p>When Django executes a query and reads the results, memory is being taken up several places to hold the results of the query:</p>

<ol>
<li><p>On the database server, it needs to keep around structures holding the result of the query.  Most database servers are good about not keeping any more rows of the result in memory than they absolutely have to, and in any event, it&#8217;s pretty much out of Django&#8217;s control what the database server stores.  So, we shall trust the database to do the right thing, and move on.</p></li>
<li><p>Inside of the Django application, some set of the rows that will ultimately be the result of the query need to be stored while Django processes them.</p></li>
<li><p>The Django QuerySet object can (although it does not always) cache some of the results of the query as Model objects.</p></li>
<li><p>And, of course, the application might hang on to some of the objects that come back (for example, for display on the web page). Of course, this is directly under the control of the application author.  We&#8217;ll trust <em>you</em> to do the right thing, and move on.</p></li>
</ol>

<p>First, let&#8217;s talk about Django&#8217;s caching in the QuerySet object.</p>

<h3>Query Set Caching</h3>

<p>Django&#8217;s QuerySet objects serve two roles: They&#8217;re data structures representing an SQL query, and an API to access the results of the query.  There&#8217;s no explicit &#8220;do this query now, please&#8221; operation in Django (although some operations have that as, shall we say, a strong implication); by and large, Django waits until you try to get the results of a query before executing it.  So, until you get the first object out of the query set, Django won&#8217;t have even executed the query.</p>

<p>Django also has a caching mechanism built into QuerySet.  This cache stores the objects that are manufactured from the rows as they come back from the database, so that multiple accesses to the same object <em>from the same QuerySet</em> will return the cached object instead of a new copy.</p>

<p>Note, however, the emphasis on <em>from the same QuerySet</em>.  A surprisingly large number of operations clone the QuerySet before operating on it.  For example:</p>

<pre><code>qs = ExampleModel.objects.filter(name='Fred')
x = qs[2]
x = qs[2]
</code></pre>

<p>This will do two queries.  <code>qs[2]</code>, under the hood, clones the query set, applies a limit of <code>[2:3]</code> to it, executes the query, returns the resulting object, and throws the limited QuerySet away.  Slicing does exactly the same thing.</p>

<p>However, there is an exception.  If you do this:</p>

<pre><code>qs = ExampleModel.objects.filter(name='Fred')
list(qs)
x = qs[2]
x = qs[2]
</code></pre>

<p>&#8230; the access pattern will be very different.  <code>list(qs)</code> forces the evaluation of the query set, so Django will send the query to the database server, and populate the QuerySet (and its cache) with the result.  Then, the <code>qs[2]</code> operaitons don&#8217;t copy the QuertSet; it just hits the cache.</p>

<p>Note, though, that this came at the expense of retrieving every row that matched the query from the server, and creating objects for it.  If you force the QuerySet to be evaluated, Django creates objects for everything that matches the query.</p>

<p>When you iterate over a QuerySet, the behavior is slightly different.  The QuerySet cache is always built from the first object that matches the query on up; it&#8217;s not sparse (for example, you&#8217;ll never have the situation where qs[4] and qs[1000] are in the cache, but the objects between them aren&#8217;t).  As you iterate over a QuerySet, if the cache is not already populated, Django grabs the rows in chunks (currently hard-coded to be 100) and fills the cache ahead of the iterator.  This does mean that if you do a query, then only iterate over the first few elements, the cache doesn&#8217;t fill up with stuff you are never going to look at.</p>

<p>You can defeat the caching by using <code>.iterator()</code>.  For example:</p>

<pre><code>qs = ExampleModel.objects.filter(name='Fred')
for x in qs.iterator():
   do_something_wonderful(x)
</code></pre>

<p>This will execute the query, and return each resulting object back, but without filling the cache.  (It also won&#8217;t return cached objects if they already exist; <code>.iterator()</code> forces a reexecution of the query.)  As the Django documentation says, this can be handy if it is a huge result set.</p>

<p>So, let&#8217;s say you for some reason want to process 100 million rows.  You know for sure that you won&#8217;t be able to hold all 100 million Model objects in memory, so you dutifully do:</p>

<pre><code>qs = GiganticTableModel.objects.all()
for giant in qs.iterator():
    # And BANG, you get an out of memory exception right here.
</code></pre>

<p>But what happened?  Why did you run out of memory before you even saw a single object out of <code>.iterator()</code>?</p>

<h3>Daddy, Where Do Model Objects Come From?</h3>

<p>Let&#8217;s take a moment and trace down the code path that gets executed here:</p>

<ul>
<li><p>Creating the QuerySet doesn&#8217;t touch the database at all, as noted above.</p></li>
<li><p>After a certain amount of fussing around, <code>.iterator()</code> calls the underlying backend query machinery to perform the query.</p></li>
<li><p>The backend machinery executes the query, and creates an iterator over the resulting rows.  That iterator (in this case) grabs a chunk of rows at a time using <code>.fetchmany()</code>, and returns them one at a time.  (At it happens, that chunk is hardcoded at 100 rows.)</p></li>
<li><p>That iterator is called by the actual iterator returned by <code>.iterator()</code>, so the iteration (pfew!) proceeds as: Call to get a row (which refills if the last grab of 100 is exhausted), create a new object, and return it up.  Create an object from that row, return it to the caller.</p></li>
</ul>

<p>So, why are we getting an out of memory condition?  Even though there are 100 million rows in the result, there should only be 100 in memory at any one time, right?</p>

<p>Sadly, wrong. At the moment that the backend machinery executes the query, <em>all 100 million rows are returned by the database server at once.</em></p>

<p>To quote the <a href="http://initd.org/psycopg/docs/usage.html#server-side-cursors">psycopg2 documentation</a>:</p>

<blockquote>
  <p>When a database query is executed, the Psycopg cursor usually fetches all the records returned by the backend, transferring them to the client process. If the query returned an huge amount of data, a proportionally large amount of memory will be allocated by the client.</p>
</blockquote>

<p>This is true even if you do a <code>.fetchone()</code> or <code>.fetchmany()</code>, not just a <code>.fetchall()</code>.  And there&#8217;s no way, while staying entirely within the standard Django QuerySet machinery, to change this behavior.</p>

<p>So, what do we do?</p>

<h3>&#8220;Doctor, It Hurts When I Do That.&#8221;</h3>

<p>&#8220;So, don&#8217;t do that.&#8221;</p>

<p>If at all possible, don&#8217;t process very large result sets directly in Django.  Even setting aside the memory consumption, it&#8217;s a horribly inefficient use of pretty much every part of the toolchain.  Much more appealing options include:</p>

<ol>
<li><p>Use <code>.update()</code> to push the execution into the server.</p></li>
<li><p>Use a stored procedure or raw SQL.</p></li>
</ol>

<p>Modern database servers are designed to crunch large result sets; leave the data on the server and do it there.</p>

<h3>Take Smaller Bites</h3>

<p>If there is a way of partitioning the data up into smaller chunks, do that.  (For example, processing by day, or ID range.)  Although I wouldn&#8217;t exactly call it &#8220;best practice,&#8221; you could iterate through the rows by using ranges of the primary key, assuming a standard Django serial integer PK:</p>

<pre><code>i = 0
while True:
    qs = GiganticTableModel.objects.filter(pk__gte=i*1000, pk__lt=(i+1)*1000)
    try:
        for giant in qs:
            do_something_wonderful(giant)
    except GiganticTableModel.DoesNotExist:
        break

    i += 1
</code></pre>

<p>There&#8217;s also <a href="http://www.mellowmorning.com/2010/03/03/django-query-set-iterator-for-really-large-querysets/">an example here</a> of constructing an iterator that does much the same thing.</p>

<h3>Use a Database-Side Cursor</h3>

<p>The way that databases really deal with this problem is <a href="http://www.postgresql.org/docs/9.0/interactive/sql-declare.html">cursors</a>.  Not the Python DBI <code>cursor</code>, in this case; server-side cursors are a structure which holds the result of a query and allows the client to read portions of it at will without having the whole thing shipped across.</p>

<p>They&#8217;re wonderful, and Django should use them.  It doesn&#8217;t.  However, you can, using direct SQL.</p>

<p>To create a cursor in PostgreSQL in the server, first, we need to have a transaction open.  For the full details about Django transaction management, check out some of my earlier blog posts.  This is required because the type of cursors we&#8217;ll be using will only persist for the duration of the transaction.</p>

<p>Now, the SQL sequence looks something like this.  Instead of saying:</p>

<pre><code>SELECT * FROM app_gigantictablemodel;
</code></pre>

<p>We say:</p>

<pre><code>DECLARE gigantic_cursor BINARY CURSOR FOR SELECT * FROM app_gigantictablemodel;
</code></pre>

<p>(The <code>BINARY</code> keyword allows it to use the more-efficient binary protocol between the database server and the application.)</p>

<p>Then, to get results, we can just say:</p>

<pre><code>FETCH 1000 FROM gigantic_cursor;
</code></pre>

<p>&#8230; or however many rows we want to get.</p>

<p>And then, we can just iterate over them (of course, we&#8217;re getting the rows as rows, rather than objects):</p>

<pre><code>cursor = connection.cursor()
    # Remember that this 'cursor' is a different thing than the server-side cursor!
cursor.execute("DECLARE gigantic_cursor BINARY CURSOR 
                    FOR SELECT * FROM app_gigantictablemodel")

while True:
    cursor.execute("FETCH 1000 FROM gigantic_cursor")
    rows = cursor.fetchall()

    if not rows:
        break

    for row in rows():
        ...
</code></pre>

<p>Now, there&#8217;s something that should work great, but doesn&#8217;t.  In 1.2, Django introduced raw SQL queries that return a RawQuerySet.  So, one could in theory do this:</p>

<pre><code>qs = GiganticTableModel.objects.raw("FETCH 1000 FROM gigantic_cursor")
</code></pre>

<p>Except we get an exception:</p>

<pre><code>Raw queries are limited to SELECT queries. Use connection.cursor directly for other types of queries.
</code></pre>

<p>Presumably to guard against weird errors, raw queries do a hardcoded check against the query string, making sure it starts with SELECT.  It would be nice if this were liberalized to allow FETCHes.</p>

<hr />

<p>So, if you must, you can process gigantic result sets in Django. But, ideally, you should design your application to make it unnecessary.  If you <em>must</em> process large result sets, do it on the database server; that&#8217;s what it&#8217;s there for.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/13/very-large-result-sets-in-django-using-postgresql/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Waiting for psycopg2.3: NamedTuples</title>
		<link>http://thebuild.com/blog/2010/11/06/waiting-for-psycopg2-3-namedtuples/</link>
		<comments>http://thebuild.com/blog/2010/11/06/waiting-for-psycopg2-3-namedtuples/#comments</comments>
		<pubDate>Sun, 07 Nov 2010 02:04:49 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=203</guid>
		<description><![CDATA[Christmas just came early for me. psycopg2.3, now in beta, includes named tuples as return values from queries.

If you are tired of writing result[4], and would much prefer to write result.column_name, you now can.

Yay!
]]></description>
			<content:encoded><![CDATA[<p><a href="http://initd.org/psycopg/docs/extras.html#namedtuple-cursor">Christmas just came early for me</a>. psycopg2.3, now in beta, includes named tuples as return values from queries.</p>

<p>If you are tired of writing result[4], and would much prefer to write result.column_name, you now can.</p>

<p>Yay!</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/11/06/waiting-for-psycopg2-3-namedtuples/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Small PostgreSQL Installations and 9.0 Replication</title>
		<link>http://thebuild.com/blog/2010/10/28/small-postgresql-installations-and-9-0-replication/</link>
		<comments>http://thebuild.com/blog/2010/10/28/small-postgresql-installations-and-9-0-replication/#comments</comments>
		<pubDate>Thu, 28 Oct 2010 18:24:30 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=198</guid>
		<description><![CDATA[<strong>tl;dr:</strong> PG 9.0's streaming replication will be widely adopted by smaller installations that use PG to manage business-critical data, specifically because it makes it something a casual DBA can do, something we've not had before with PG.]]></description>
			<content:encoded><![CDATA[<p>Yesterday, <a href="http://thebuild.com/blog/2010/10/27/users-want-functionality-not-features/">I commented on a post</a> about how widespread uptake on 9.0 replication will be. I disagreed with the assessment that &#8220;users&#8221; (by which we mean small installations of PostgreSQL, defined however you care to) will not be interested in 9.0&#8217;s hot standby/streaming replication.</p>

<p>Ultimately, of course, we&#8217;ll find out. But I strongly feel that 9.0&#8217;s streaming replication will be a big deal for small PostgreSQL installations&#8230; indeed, I think it will be a much bigger deal for them than big ones.</p>

<p>First, I&#8217;ll happily exclude hobbyist and developer installs of PostgreSQL. I don&#8217;t back up my development PG databases more often than once a day, and I certainly don&#8217;t have any kind of replication set up for them (unless that&#8217;s what I&#8217;m developing). The important part, the code, lives in a DVCS, and if I had to reconstruct the db from scratch, no big deal&#8230; indeed, I do it all the time.</p>

<p>I&#8217;m talking about small installations of PG that are used to as authoritative records of business-critical information: Web site transactions, for example. The fact that, traditionally, these users of PG haven&#8217;t been all that into replication solutions has nothing to do with their actual <em>need</em> for replication; instead, it has to do with the solutions they had available.</p>

<ul>
<li>Small installations generally don&#8217;t have the time and expertise to search out third-party solutions, or the budget to pay an expert to do so. If it doesn&#8217;t come in the base RPM or tarball, they&#8217;re not interested in it.</li>
<li>The third-party solutions that are available are all complex and fiddly to set up. I&#8217;m certainly <em>not</em> bashing Slony, for example; it&#8217;s a great tool. But it is not something that a casual DBA wants to take on.</li>
</ul>

<p>So, they make do with <code>pg_dumpall</code> and hope for the best&#8230; and then call someone <a href="http://pgexperts.com">like us</a> if that doesn&#8217;t work.</p>

<p>But it is fallacious to conclude that because they are not using replication right now, they have no use for it. Ask a corner liquor store if they could afford to have an entire day&#8217;s worth of electronic transactions just vanish; I&#8217;ll bet a bottle of something cheap that they carry that the answer would be, &#8220;Of course not.&#8221; It might not be worth a $15,000 consulting engagement to set it up, but it&#8217;s worth something, possibly quite a bit.</p>

<p>Indeed, this is one of the things that&#8217;s driving adoption of &#8220;cloud computing&#8221;: The (sometimes erroneous) idea that the cloud provider is managing disaster recovery and high availability for you, included in the cost of your monthly service charge.</p>

<p><strong>tl;dr:</strong> PG 9.0&#8217;s streaming replication will be widely adopted by smaller installations that use PG to manage business-critical data, specifically because it makes it something a casual DBA can do, something we&#8217;ve not had before with PG.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/10/28/small-postgresql-installations-and-9-0-replication/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
	</channel>
</rss>

