<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>The Build</title>
	<atom:link href="http://thebuild.com/blog/feed/" rel="self" type="application/rss+xml" />
	<link>http://thebuild.com/blog</link>
	<description>programming, etc.</description>
	<lastBuildDate>Wed, 25 Jan 2012 06:03:09 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.8.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>PostgreSQL Performance When It&#8217;s Not Your Job</title>
		<link>http://thebuild.com/blog/2012/01/24/postgresql-performance-when-its-not-your-job/</link>
		<comments>http://thebuild.com/blog/2012/01/24/postgresql-performance-when-its-not-your-job/#comments</comments>
		<pubDate>Wed, 25 Jan 2012 06:03:09 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=326</guid>
		<description><![CDATA[My presentation from SCALE 10x, &#8220;PostgreSQL Performance When It&#8217;s Not Your Job&#8221; is now available for download.
]]></description>
			<content:encoded><![CDATA[<p>My presentation from SCALE 10x, <a href="http://thebuild.com/presentations/not-my-job.pdf">&#8220;PostgreSQL Performance When It&#8217;s Not Your Job&#8221;</a> is now available for download.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2012/01/24/postgresql-performance-when-its-not-your-job/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>&#8220;Sharding &amp; IDs at Instagram&#8221;</title>
		<link>http://thebuild.com/blog/2011/09/30/sharding-ids-at-instagram/</link>
		<comments>http://thebuild.com/blog/2011/09/30/sharding-ids-at-instagram/#comments</comments>
		<pubDate>Sat, 01 Oct 2011 05:39:21 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=324</guid>
		<description><![CDATA[I&#8217;d like to recommend an interesting post, &#8220;Sharding &#38; IDs at Instagram&#8221;, about sharding using Postgres.
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;d like to recommend an interesting post, <a href="http://instagram-engineering.tumblr.com/post/10853187575/sharding-ids-at-instagram">&#8220;Sharding &amp; IDs at Instagram&#8221;</a>, about sharding using Postgres.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/09/30/sharding-ids-at-instagram/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Cleaning up after your Bucardo goats</title>
		<link>http://thebuild.com/blog/2011/09/27/cleaning-up-after-your-bucardo-goats/</link>
		<comments>http://thebuild.com/blog/2011/09/27/cleaning-up-after-your-bucardo-goats/#comments</comments>
		<pubDate>Wed, 28 Sep 2011 05:25:20 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=313</guid>
		<description><![CDATA[A pure PL/pgSQL script for daily Bucardo maintenance.]]></description>
			<content:encoded><![CDATA[<p>If you are not familiar with it already, <a href="http://bucardo.org/wiki/Bucardo">Bucardo</a> is a nifty multi-master replication system for PostgreSQL, written by <a href="http://www.endpoint.com/team/greg_sabino_mullane">Greg Sabino Mullane</a>. Written in Perl, it is great if you need replication that doesn&#8217;t have the restrictions associated with PG 9&#8217;s <a href="http://www.postgresql.org/docs/9.1/static/warm-standby.html#STREAMING-REPLICATION">streaming replication</a>.</p>

<p>To keep your Bucardo installation clean and tidy, a few <a href="http://bucardo.org/wiki/Bucardo/Cron">regular cron jobs</a> are required. One of them cleans up the archived replicated data (stored in a separate database by Bucardo) once you know you are done with it. </p>

<p>The Bucardo page above has a recommended script using all sorts of <code>bash</code>ing, but I wanted something a bit more pure-PostgreSQL; it also doesn&#8217;t purge more than one old table at a time. So, I whipped up the following PL/pgSQL function.</p>

<p>(Note that this is for Bucardo 4.4. I haven&#8217;t played with the forthcoming Bucardo 5, so I&#8217;m not sure if this is still required.)</p>

<pre><code>CREATE OR REPLACE FUNCTION bucardo.purge_frozen_child_qs(far_back interval)
    RETURNS SETOF TEXT AS
$purge_frozen_child_qs$
DECLARE
    t TEXT;
    qt TEXT;
BEGIN

    IF far_back IS NULL THEN
        RAISE EXCEPTION 'Interval cannot be null.'
            USING HINT = 'So, do not do that.';
    END IF;

    IF (now() + far_back) &gt; now() THEN
        RAISE EXCEPTION 'Interval must be negative.'
            USING HINT = 'Consider using the "ago" form of intervals.';
    END IF;

    FOR t IN 
        SELECT tablename 
            FROM pg_tables
            WHERE schemaname='freezer' 
                  AND tablename like 'child_q_%' 
                  AND (replace(tablename, 'child_q_', '')::timestamp with time zone) &lt; now() + far_back::interval
            ORDER BY tablename
    LOOP
        qt := 'freezer.' || t;
        EXECUTE 'DROP TABLE ' || qt;
        RETURN NEXT qt;
    END LOOP;

    DELETE FROM bucardo.q 
        WHERE (started &lt; now() + far_back::interval 
                OR ended &lt; now() + far_back::interval 
                OR aborted &lt; now() + far_back::interval 
                OR cdate &lt; now() + far_back::interval) 
              AND (ended IS NULL OR aborted IS NULL);

    RETURN;

END
$purge_frozen_child_qs$
LANGUAGE plpgsql
    VOLATILE;
</code></pre>

<p>To use it, just call it repeatedly from a cron job with the appropriate argument, along the lines of:</p>

<pre><code>SELECT * FROM bucardo.purge_frozen_child_qs('7 days ago'::interval);
</code></pre>

<p>It returns the names of the tables it deleted.</p>

<p>This particular function doesn&#8217;t need to be run more often than once a day. And it keeps your Bucardo goats nice and clean.</p>

<p>(A &#8220;bucardo&#8221; is a <a href="http://en.wikipedia.org/wiki/Pyrenean_Ibex">now-extinct species of goat</a>. For why Bucardo is goat-related, ask Greg.)</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/09/27/cleaning-up-after-your-bucardo-goats/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Unbreaking Your Django Application</title>
		<link>http://thebuild.com/blog/2011/07/26/unbreaking-your-django-application/</link>
		<comments>http://thebuild.com/blog/2011/07/26/unbreaking-your-django-application/#comments</comments>
		<pubDate>Tue, 26 Jul 2011 21:14:59 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=310</guid>
		<description><![CDATA[My tutorial at OSCON 2011, Unbreaking Your Django Application, is now available for download.
]]></description>
			<content:encoded><![CDATA[<p>My tutorial at OSCON 2011, <a href="http://thebuild.com/presentations/unbreaking-django.pdf">Unbreaking Your Django Application</a>, is now available for download.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/07/26/unbreaking-your-django-application/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Life with Object-Relational Mappers</title>
		<link>http://thebuild.com/blog/2011/05/18/life-with-object-relational-mappers/</link>
		<comments>http://thebuild.com/blog/2011/05/18/life-with-object-relational-mappers/#comments</comments>
		<pubDate>Wed, 18 May 2011 18:31:21 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=307</guid>
		<description><![CDATA[The slides from my talk at PGCon 2011 are now available.
]]></description>
			<content:encoded><![CDATA[<p>The slides from my talk at PGCon 2011 are <a href="http://blog.thebuild.com/presentations/drstrangedata.pdf">now available</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/05/18/life-with-object-relational-mappers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Not Convenient.</title>
		<link>http://thebuild.com/blog/2011/03/28/not-convenient/</link>
		<comments>http://thebuild.com/blog/2011/03/28/not-convenient/#comments</comments>
		<pubDate>Mon, 28 Mar 2011 16:49:02 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[Mac OS X]]></category>
		<category><![CDATA[iPhone]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=304</guid>
		<description><![CDATA[DjangoCon Europe and the Apple WWDC are at the exact same time. This is going to be a tough call.

Update: Well, that was quick. WWDC sold out in 10 hours, while I was dithering.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://djangocon.eu/">DjangoCon Europe</a> and the <a href="http://developer.apple.com/wwdc/">Apple WWDC</a> are at the exact same time. This is going to be a tough call.</p>

<p>Update: Well, that was quick. WWDC sold out in 10 hours, while I was dithering.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/03/28/not-convenient/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>What It Means to Be In Business</title>
		<link>http://thebuild.com/blog/2011/03/18/what-it-means-to-be-in-business/</link>
		<comments>http://thebuild.com/blog/2011/03/18/what-it-means-to-be-in-business/#comments</comments>
		<pubDate>Sat, 19 Mar 2011 01:08:19 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Tech Biz]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=295</guid>
		<description><![CDATA[To bring everyone up to date:


Justin Vincent wrote a post offering an opinion about the downsides of the chase of tech entrepreneurs over VC funding.
Amy Hoy wrote a post expanding on Mr Vincent&#8217;s post.
Alex Payne wrote a post criticizing this position, while finding it necessary to describe &#8220;long time acquaintance&#8221; Amy Hoy&#8217;s product as &#8220;duping [...]]]></description>
			<content:encoded><![CDATA[<p>To bring everyone up to date:</p>

<ul>
<li><a href="http://justinvincent.com/page/1392/entreporn-the-fallacy-that-wastes-your-life">Justin Vincent</a> wrote a post offering an opinion about the downsides of the chase of tech entrepreneurs over VC funding.</li>
<li><a href="http://unicornfree.com/2011/dont-let-the-bastards-grind-you-down/">Amy Hoy</a> wrote a post expanding on Mr Vincent&#8217;s post.</li>
<li><a href="http://news.ycombinator.com/item?id=2338911">Alex Payne</a> wrote a post criticizing this position, while finding it necessary to describe &#8220;long time acquaintance&#8221; Amy Hoy&#8217;s product as &#8220;duping credulous customers into overpaying for a time-tracking tool styled with this month&#8217;s CSS trends.&#8221;</li>
<li>Unaccountably, he seems to have been surprised by the negative reaction this post generated, so he posted an <a href="http://al3x.net/2011/03/18/not-a-waste.html">explanation and partial retraction</a> here.</li>
</ul>

<p>Sadly, I find his last post as incoherent as his first one is vitriolic.</p>

<p>Rather than go through it point by point, the crux of his argument is:</p>

<blockquote>
  <p>Building a business around maximizing your individual happiness is not particularly useful or admirable. That is my position, and I’m well aware that it may be unpopular with some.</p>
</blockquote>

<p>I am pleased to report, then, that Mr Payne has <em>absolutely nothing to worry about</em>, because no business that is built around the happiness of the owner as a primary goal has a hope of every getting anywhere, unless the business consists of the owner taking money out of one pocket and putting it in the other. Any business, unless it is operating in a grotesquely distorted marketplace, is primarily about pleasing its <em>customers</em> in exchange for their money.</p>

<p>I&#8217;m really not sure what these vaguely masturbatory companies Mr Payne is talking about do for a living, but every (successful) micro-business I know of is insanely, intensely focused on pleasing its customers. They have to be, because they don&#8217;t have an installed base, government-granted advantages, or (yes) piles of venture capital in the bank to fall back on if they fail to do so.</p>

<p>Mr Payne wants to run a big company. I wish him all the best. He seems to have his young heart in the right place. I have to say, though, that his emotional overreaction to the idea that someone might want to run a micro-business instead strikes me as the Puritan reacting to the idea that someone, somewhere, might be happy.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/03/18/what-it-means-to-be-in-business/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Transaction-Level Advisory Locks in PostgreSQL 9.1</title>
		<link>http://thebuild.com/blog/2011/03/13/transaction-level-advisory-locks-in-postgresql-9-1/</link>
		<comments>http://thebuild.com/blog/2011/03/13/transaction-level-advisory-locks-in-postgresql-9-1/#comments</comments>
		<pubDate>Sun, 13 Mar 2011 22:42:50 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=292</guid>
		<description><![CDATA[Advisory locks are one of the cool unsung features of PostgreSQL. In 9.1, they are getting even cooler with transaction level locks. Many details here.
]]></description>
			<content:encoded><![CDATA[<p>Advisory locks are one of the cool unsung features of PostgreSQL. In 9.1, they are getting even cooler with transaction level locks. <a href="http://www.depesz.com/index.php/2011/03/14/waiting-for-9-1-transaction-level-advisory-locks/">Many details here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/03/13/transaction-level-advisory-locks-in-postgresql-9-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Concern Troll is Concerned: Verifone vs Square</title>
		<link>http://thebuild.com/blog/2011/03/09/concern-troll-is-concerned-verifone-vs-square/</link>
		<comments>http://thebuild.com/blog/2011/03/09/concern-troll-is-concerned-verifone-vs-square/#comments</comments>
		<pubDate>Thu, 10 Mar 2011 06:25:24 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Security]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=289</guid>
		<description><![CDATA[Suppose a major manufacturer of computer keyboards announced a very serious security problem with a specific competitor&#8217;s keyboard: Someone could plug this keyboard into a computer running a malicious app, and cause a user to enter sensitive information. Thus, the manufacturer demands that their competitor recall all of these &#8220;insecure&#8221; keyboards.

Anyone with the technical sense [...]]]></description>
			<content:encoded><![CDATA[<p>Suppose a major manufacturer of computer keyboards announced a <em>very serious</em> security problem with a specific competitor&#8217;s keyboard: Someone could plug this keyboard into a computer running a malicious app, and cause a user to enter sensitive information. Thus, the manufacturer demands that their competitor recall all of these &#8220;insecure&#8221; keyboards.</p>

<p>Anyone with the technical sense of a rock would pause for a moment, and then burst out in laughter at the utter absurdity of this proclamation. No one would ever attempt to make such a ludicrous and obviously self-serving claim, would they?</p>

<p><a href="http://www.sq-skim.com/">Verifone would.</a> Verifone is <em>very very concerned</em> about <a href="https://squareup.com/">Square&#8217;s</a> iPhone card scanner, because someone could run a malicious app on the iPhone and collect card data using it. The fact that Square just announced <a href="https://squareup.com/pricing">new pricing</a> undercutting Verifone&#8217;s is, of course, completely coincidental.</p>

<p>Where to begin?</p>

<ul>
<li><p>It is true that Square&#8217;s attachment does not encrypt the card track information between the iPhone and the card reader. This is true of pretty much every single card reader in the entire world. It is not the job of the card reader to encrypt data, any more than it is the job of the keyboard to encrypt your password. Verifone seems unconcerned at all of the other card readers you can buy from, say, <a href="http://www.amazon.com/MagTek-Swipe-Reader-Keyboard-Emulation/dp/B0013A1VA4">Amazon</a> (just for example).</p></li>
<li><p>For Verifone&#8217;s apocalyptic scenario to occur, the iPhone into which the card reader is plugged must be running a malicious app. This pretty much requires the iPhone user to be in on the scam, which means that they could be using any hardware they wish to collect this card data. If the merchant is crooked, then they&#8217;ll find a way to collect the card data, since they <em>have possession of the card</em> (on which is printed essentially all of the relevant data that is on the mag tracks, plus the CVV printed on the back).</p></li>
<li><p>Verifone&#8217;s competing solution, if the brochure is to be believed, encrypts the data at swipe-time. That&#8217;s nice, but the chance of card data being compromised <em>between</em> the reader and the iPhone, or during that extremely limited time that it is sitting unencrypted in the iPhone&#8217;s memory, is essentially zero. Again, Verifone seems unconcerned that Square&#8217;s app works exactly like every other PC-based credit card processing application in the entire world; indeed, Square&#8217;s is considerably more secure than most, since the merchant doesn&#8217;t have access to the card information. (For example, on my completely certified, authorized, and every-spec-compliant Nurit wireless card processing terminal, I can retrieve credit card numbers from a batch with no hassle whatsoever.)</p></li>
</ul>

<p>In short, Verifone is bashing a competitor because the competitor&#8217;s pricing is more consumer-friendly than Verifone&#8217;s. Their technical arguments are nonsense, and they should be ashamed of launching a FUD campaign that plays on credit card security paranoia.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/03/09/concern-troll-is-concerned-verifone-vs-square/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Django and PostgreSQL at PostgreSQL Conference East</title>
		<link>http://thebuild.com/blog/2011/03/09/django-and-postgresql-at-postgresql-conference-east/</link>
		<comments>http://thebuild.com/blog/2011/03/09/django-and-postgresql-at-postgresql-conference-east/#comments</comments>
		<pubDate>Thu, 10 Mar 2011 06:03:51 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=287</guid>
		<description><![CDATA[I&#8217;ll be giving a full day tutorial about developing Django applications using PostgreSQL. If you are just getting started with Django, this is a great introduction; it is intended for developers who are just getting into serious Django/PG development.

It&#8217;ll cover general development in Django, with a lot of PostgreSQL-specific details.

And, of course, the whole conference [...]]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be giving a <a href="https://www.postgresqlconference.org/files/east_2011_schedule.html">full day tutorial</a> about developing Django applications using PostgreSQL. If you are just getting started with Django, this is a great introduction; it is intended for developers who are just getting into serious Django/PG development.</p>

<p>It&#8217;ll cover general development in Django, with a lot of PostgreSQL-specific details.</p>

<p>And, of course, the whole conference will be a fount of great PostgreSQL geekery.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/03/09/django-and-postgresql-at-postgresql-conference-east/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Anatomy of a Crushing&#8221;</title>
		<link>http://thebuild.com/blog/2011/03/08/anatomy-of-a-crushing/</link>
		<comments>http://thebuild.com/blog/2011/03/08/anatomy-of-a-crushing/#comments</comments>
		<pubDate>Wed, 09 Mar 2011 01:52:51 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=284</guid>
		<description><![CDATA[A fun and interesting article about a sudden burst in traffic at Pinboard when Yahoo! announced they were shutting down Delicious. Relevant to app and DB designers everywhere.
]]></description>
			<content:encoded><![CDATA[<p><a href="http://pinboard.in/blog/173/">A fun and interesting</a> article about a sudden burst in traffic at Pinboard when Yahoo! announced they were shutting down Delicious. Relevant to app and DB designers everywhere.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/03/08/anatomy-of-a-crushing/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;10 Ways to Kill Performance&#8221;</title>
		<link>http://thebuild.com/blog/2011/02/25/10-ways-to-kill-performanc/</link>
		<comments>http://thebuild.com/blog/2011/02/25/10-ways-to-kill-performanc/#comments</comments>
		<pubDate>Sat, 26 Feb 2011 00:17:55 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=282</guid>
		<description><![CDATA[The slides from my talk, &#8220;10 Easy Ways to Destroy Performance&#8221; from PgDay at SCALE 9X are available.
]]></description>
			<content:encoded><![CDATA[<p>The slides from my talk, <a href="http://thebuild.com/presentations/10-ways-to-kill-performance.pdf">&#8220;10 Easy Ways to Destroy Performance&#8221;</a> from PgDay at SCALE 9X are available.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/02/25/10-ways-to-kill-performanc/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>&#8220;10 Easy Ways to Destroy Performance&#8221; at pgDay at SCALE-9X</title>
		<link>http://thebuild.com/blog/2011/02/15/10-easy-ways-to-destroy-performance-scale-9x/</link>
		<comments>http://thebuild.com/blog/2011/02/15/10-easy-ways-to-destroy-performance-scale-9x/#comments</comments>
		<pubDate>Wed, 16 Feb 2011 05:22:19 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=276</guid>
		<description><![CDATA[I&#8217;ll be presenting a talk on &#8220;10 Easy Ways to Destroy Performance&#8221; at pgDay at SCALE-9X, on February 25th in Los Angeles.
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be presenting a talk on &#8220;10 Easy Ways to Destroy Performance&#8221; at <a href="https://sites.google.com/site/pgdayla/">pgDay at SCALE-9X</a>, on February 25th in Los Angeles.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/02/15/10-easy-ways-to-destroy-performance-scale-9x/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>&#8220;Django Development with PostgreSQL&#8221; at PostgreSQL Conference East</title>
		<link>http://thebuild.com/blog/2011/02/15/django-postgresql-postgresql-conference-east/</link>
		<comments>http://thebuild.com/blog/2011/02/15/django-postgresql-postgresql-conference-east/#comments</comments>
		<pubDate>Wed, 16 Feb 2011 05:19:59 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=274</guid>
		<description><![CDATA[I&#8217;ll be presenting a full-day tutorial on Django Development with PostgreSQL at PostgreSQL Conference East, March 22-25 in New York!
]]></description>
			<content:encoded><![CDATA[<p>I&#8217;ll be presenting a full-day tutorial on <a href="https://www.postgresqlconference.org/content/django-development-postgresql">Django Development with PostgreSQL</a> at <a href="https://www.postgresqlconference.org/">PostgreSQL Conference East</a>, March 22-25 in New York!</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/02/15/django-postgresql-postgresql-conference-east/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>PostgreSQL for Servoy Developers</title>
		<link>http://thebuild.com/blog/2011/02/04/postgresql-for-servoy-developers/</link>
		<comments>http://thebuild.com/blog/2011/02/04/postgresql-for-servoy-developers/#comments</comments>
		<pubDate>Fri, 04 Feb 2011 08:59:49 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=269</guid>
		<description><![CDATA[The slides from my presentation on PostgreSQL for Servoy Developers, presented at ServoyWorld 2011, are available here.
]]></description>
			<content:encoded><![CDATA[<p>The slides from my presentation on PostgreSQL for Servoy Developers, presented at <a href="http://servoy.com">ServoyWorld 2011</a>, are <a href="http://thebuild.com/presentations/pg-servoy.pdf">available here</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2011/02/04/postgresql-for-servoy-developers/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Nobody Here But Us Chickens: Google and Lies We Tell Ourselves</title>
		<link>http://thebuild.com/blog/2010/12/31/nobody-here-but-us-chickens-google-and-lies-we-tell-ourselves/</link>
		<comments>http://thebuild.com/blog/2010/12/31/nobody-here-but-us-chickens-google-and-lies-we-tell-ourselves/#comments</comments>
		<pubDate>Fri, 31 Dec 2010 20:50:59 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Tech Biz]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=249</guid>
		<description><![CDATA[If you make a tradeoff, be honest about it. Don't lie to yourself that you are making a positive architectural decision when you make a negative tradeoff.]]></description>
			<content:encoded><![CDATA[<p><strong>tl;dr</strong>: If you make a tradeoff, be honest about it. Don&#8217;t lie to yourself that you are making a positive architectural decision when you make a negative tradeoff.</p>

<p><span id="more-249"></span></p>

<hr />

<p>In a flash of snarkiness, I posted this to <a href="http://twitter.com/Xof/status/20620780811325440">my Twitter account</a>, based on a colleague&#8217;s attempt to solve a YouTube password problem:</p>

<blockquote>
  <p>Google doesn&#8217;t want to hear from its users any more than <a href="http://www.tyson.com/">Tyson</a> wants to hear from its chickens.</p>
</blockquote>

<p>The slightly-expanded message is: The users of Google&#8217;s free services are a product to be delivered to advertisers, not customers towards whom Google feels an obligation. I think this is more than just fair; I think it&#8217;s self-evident.</p>

<p>There was a certain amount of chatter form those who insist on viewing the world as a <a href="http://en.wikipedia.org/wiki/Manichaeism">Manichean</a> struggle between Apple and Google (which had nothing whatsoever to do with it), but it more productively resulting in <a href="http://twitter.com/mattcutts/status/20636038305157121">a reply</a> from no less that <a href="http://www.mattcutts.com/blog/">Matt Cutts</a> (and I appreciate his taking the time to bounce this around with me):</p>

<blockquote>
  <p>@Xof that&#8217;s certainly not the case.</p>
</blockquote>

<p><a href="http://twitter.com/Xof/status/20638091966414848">I replied</a>:</p>

<blockquote>
  <p>@mattcutts Glad to hear it. What&#8217;s the tech support number for YouTube users?</p>
</blockquote>

<p><a href="http://twitter.com/mattcutts/status/20717489180647424">Mr Cutts</a> replied:</p>

<blockquote>
  <p>@Xof phone lines are 1:1, which means other users can&#8217;t benefit from answers/feedback. The forum for YouTube is <a href="http://goo.gl/pApu">http://goo.gl/pApu</a></p>
</blockquote>

<p>And here we get to the lies we tell ourselves.</p>

<p>You can&#8217;t do everything all the time. No person, product or company can hit every single mark, can fulfill every single desire or request. No one should ever be ashamed of that; everyone has to prioritize.</p>

<p>Google&#8217;s prioritization is, &#8220;For our free services, we can&#8217;t provide individualized support because it would simply be too expensive. Instead, we mobsource our support. That&#8217;s what you are getting when you use our free services. Did we mention that they&#8217;re free?&#8221; That&#8217;s not insane, and you can agree with it (and use Google&#8217;s services), or disagree with it (and not, or do so anyway and whine about the terrible support, your choice). But that&#8217;s what you get.</p>

<p>But attempting to portray that decision as, &#8220;Forums are <em>so much cooler</em> than an actual human being on the phone who can actually address your particular item&#8221; is absurd. Can you imagine, say, a bank with that position? &#8221;An unexplained debit to your account? No problem! Post a comment to our Forum, and maybe someone has had a similar problem and you can share your experience. Sometimes, an employee might see the post and comment. It&#8217;s rad!&#8221;</p>

<p>Tech support forums are great, and every company should sponsor one for their products, but they are not a substitute for a support system that does not rely on the kindness of strangers and does involve someone who can safely access details relevant to your individual situation and who can access internal expertise within the company. I would hope that&#8217;s self-evident, too.</p>

<p>(And, of course, a rep can share their redacted experience from a call on a forum.)</p>

<p>I&#8217;m reminded of the apologia  for why <code>qmail</code> doesn&#8217;t send multiple messages on one SMTP connection, or the long diatribe in the <a href="http://oreilly.com/catalog/9780596000271">Camel Book</a> about why Perl 5 doesn&#8217;t have simple, reliable bytecode compilation. In both cases, a designer made a choice that feature was complex to implement and not a high priority; reasonable people can disagree with their choices, but they made them for what I assume are sound reasons. However, rather than just say, &#8220;Nope, not getting to that one right away,&#8221; what was a issue of priority was tarted up to look like a wise technical architecture decision, which neither one was. (Lots of MTAs can do just fine sending multiple messages on one connection, and if bytecode compilation for Perl was such a dumb idea, what&#8217;s up with <a href="http://www.parrot.org/">Parrot</a>?)</p>

<p>Every business (and person) needs to make choices. Nothing wrong with that. Just be honest with yourself about why you are doing it, so you can be honest with other people.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/31/nobody-here-but-us-chickens-google-and-lies-we-tell-ourselves/feed/</wfw:commentRss>
		<slash:comments>1</slash:comments>
		</item>
		<item>
		<title>Extra columns when doing .distinct() in a Django QuerySet</title>
		<link>http://thebuild.com/blog/2010/12/22/extra-columns-when-doing-distinct-in-a-django-queryset/</link>
		<comments>http://thebuild.com/blog/2010/12/22/extra-columns-when-doing-distinct-in-a-django-queryset/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 22:54:05 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=239</guid>
		<description><![CDATA[If you are doing a <code>.distinct()</code> query and limiting the results using <code>.values()</code> or <code>.values_list()</code>, you may be in for a surprise if your model has a default ordering using the Meta value <code>ordering</code>. You probably want to clear the ordering using <code>.order_by()</code> with no parameters.]]></description>
			<content:encoded><![CDATA[<p><strong>tl;dr</strong>: If you are doing a <code>.distinct()</code> query and limiting the results using <code>.values()</code> or <code>.values_list()</code>, you may be in for a surprise if your model has a default ordering using the Meta value <code>ordering</code>. You probably want to clear the ordering using <code>.order_by()</code> with no parameters.</p>

<p><span id="more-239"></span></p>

<hr />

<p>If a model is ordered, either by <code>.order_by()</code> on the QuerySet or a Meta <code>ordering</code> value, it will always include that field in the QuerySet. This is true <em>even if the query uses <code>.distinct()</code></em>. To quote <a href="http://docs.djangoproject.com/en/1.2/ref/models/querysets/#distinct">the documentation</a>:</p>

<blockquote>
  <p>Any fields used in an<code>order_by()</code> call are included in the SQL SELECT columns. This can sometimes lead to unexpected results when used in conjunction with <code>distinct()</code>.</p>
</blockquote>

<p>(The documentation as written implies that is only problem with related models, but as we&#8217;ll see, it&#8217;s a problem in general. <a href="http://code.djangoproject.com/ticket/14942">A documentation patch</a> is probably in order here.)</p>

<p>By way of illustration, let&#8217;s assume you have the following models:</p>

<pre><code>from django.db import models

class Publisher(models.Model):
    name = models.TextField()

    class Meta:
        ordering = [ 'name', ]

class Book(models.Model):
    title = models.TextField()
    topic = models.TextField()
    publisher = models.ForeignKey(Publisher)

    class Meta:
        ordering = [ 'title', ]
</code></pre>

<p>And we create some rows:</p>

<pre><code>pub = Publisher(name="Strange But True Publications")
pub.save()
</code></pre>

<p>And some books:</p>

<pre><code>book1 = Book(title="New Topics in Industrial Meringue Production",
             topic="Cooking",
             publisher=pub)
book1.save()

book2 = Book(title="Your Chicken's First Song Book",
             topic="Animal Husbandry",
             publisher=pub)
book2.save()
</code></pre>

<p>Now, we want to get the list of IDs of the publishers, and we&#8217;re using the <a href="http://thebuild.com/blog/2010/12/22/getting-the-id-of-related-objects-in-django/">cool optimization that I described earlier</a>, with the optimization a commenter suggested (thanks!):</p>

<pre><code>&gt;&gt;&gt; q = Book.objects.values_list('publisher_id', flat=True).distinct()
&gt;&gt;&gt; print q
[1, 1]
</code></pre>

<p>Um, wait.  That&#8217;s not right.  Why would it return <code>1</code> twice when we said <code>.distinct()</code>?  Let&#8217;s look at the SQL (you <em>are</em> doing a <code>tail -f</code> on the PostgreSQL logs while  you develop, right?):</p>

<pre><code>LOG:  statement: SELECT DISTINCT "x_book"."publisher_id", "x_book"."title" FROM "x_book" ORDER BY "x_book"."title" ASC LIMIT 21
</code></pre>

<p>And there we have it.  It includes the <code>title</code> field in the query, even though it doesn&#8217;t return it.  Since the <code>DISTINCT</code> thus applies to both, we have two distinct rows, rather than one.</p>

<p>The fix, fortunately, is easy; just clear the ordering with a <code>.order_by()</code> without any parameters:</p>

<pre><code>&gt;&gt;&gt; q = Book.objects.values_list('publisher_id', flat=True).distinct().order_by()
&gt;&gt;&gt; print q
[1]
</code></pre>

<p>And the query:</p>

<pre><code>LOG:  statement: SELECT DISTINCT "x_book"."publisher_id" FROM "x_book" LIMIT 21
</code></pre>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/22/extra-columns-when-doing-distinct-in-a-django-queryset/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Getting the ID of Related Objects in Django</title>
		<link>http://thebuild.com/blog/2010/12/22/getting-the-id-of-related-objects-in-django/</link>
		<comments>http://thebuild.com/blog/2010/12/22/getting-the-id-of-related-objects-in-django/#comments</comments>
		<pubDate>Wed, 22 Dec 2010 07:00:03 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=232</guid>
		<description><![CDATA[Don't retrieve a whole row just to get the primary key you had anyway. Don't iterate in the app; let the database server do the iteration for you. ]]></description>
			<content:encoded><![CDATA[<p><strong>tl;dr</strong>: Don&#8217;t retrieve a whole row just to get the primary key you had anyway. Don&#8217;t iterate in the app; let the database server do the iteration for you. </p>

<p><span id="more-232"></span></p>

<hr />

<p>There&#8217;s a couple of bad habits I see a lot in Django code (including, sadly, my own), which is abuse of a ForeignKey field. Let&#8217;s take the classic example:</p>

<pre><code>class Publisher(Model):
    # We accept the default 'id' column
    name = TextField()
    ...

class Book(Model):
    # Likewise
    title = TextField()
    topic = TextField()
    publisher = ForeignKey(Publisher)
        # Remember this creates a publisher_id column
</code></pre>

<p>Now, let&#8217;s say we have a book:</p>

<pre><code>b = Book.objects.get(title="Interior Landscapes")
</code></pre>

<p>And we want the ID of the publisher.</p>

<p><strong>Don&#8217;t do this:</strong></p>

<pre><code>pub_id = b.publisher.id
</code></pre>

<p>This works, but it&#8217;s absurd: It does a separate select to fetch the entire Publisher object, and then extracts the ID.  But, of course, <em>it already had the ID</em>, because that&#8217;s how it retrieved the publisher object.  Instead, just go straight to the created ID field:</p>

<pre><code>pub_id = b.publisher_id
</code></pre>

<p>Next, don&#8217;t use iteration to build lists if you can get the data directly out of the database. For example, suppose we want the list of publishers who publish books with topic &#8220;Surreal Architecture&#8221;. Far too often, I see this:</p>

<pre><code>surreal_books = Books.objects.filter(topic="Surreal Architecture")

surreal_publishers = set([book.publisher.id for book in surreal_books])
</code></pre>

<p>In this case, Django will send one query to get the list of books, and then do a separate query for each publisher to get the publisher id&#8230; even though they&#8217;re already in memory.</p>

<pre><code>surreal_publishers = set([book.publisher_id for book in surreal_books])
</code></pre>

<p>This is better, since it doesn&#8217;t have to retrieve each publisher, but far better is to make the database do all the work:</p>

<pre><code>surreal_publishers_qs = Books.objects.filter(topic="Surreal Architecture").values('publisher_id').distinct()
</code></pre>

<p>The result set, in this case, is a bit of an odd duck: It&#8217;s a list of dictionaries, each dict being of the form <code>{ 'publisher_id': &lt;id value&gt; }</code>.  Of course, Python being Python, it&#8217;s not hard to transform that into a set:</p>

<pre><code>surreal_publishers = set([entry['publisher_id'] for entry in surreal_publishers_qs])
</code></pre>

<p>And we didn&#8217;t have to do any raw SQL!</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/22/getting-the-id-of-related-objects-in-django/feed/</wfw:commentRss>
		<slash:comments>6</slash:comments>
		</item>
		<item>
		<title>Why I run qmail</title>
		<link>http://thebuild.com/blog/2010/12/17/why-i-run-qmail/</link>
		<comments>http://thebuild.com/blog/2010/12/17/why-i-run-qmail/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 18:35:02 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Security]]></category>
		<category><![CDATA[Tools]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=228</guid>
		<description><![CDATA[exim root exploit. Fix now.]]></description>
			<content:encoded><![CDATA[<p>There&#8217;s a very nasty <a href="http://www.exim.org/lurker/message/20101207.215955.bb32d4f2.en.html">root exim exploit</a> in the wild.</p>

<p><strong>Updated:</strong> To be fair to the hard-working exim team, this <a href="http://www.exim.org/lurker/message/20101210.164935.385e04d0.en.html">bug was fixed some time ago</a>.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/17/why-i-run-qmail/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Comparing NULLs Considered Silly</title>
		<link>http://thebuild.com/blog/2010/12/17/comparing-nulls-considered-silly/</link>
		<comments>http://thebuild.com/blog/2010/12/17/comparing-nulls-considered-silly/#comments</comments>
		<pubDate>Fri, 17 Dec 2010 16:48:19 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=226</guid>
		<description><![CDATA[You can't compare NULLs. A nullable primary key is a contradiction in terms. You can't join on NULL, so a NULL foreign key refers to nothing, by definition. NULL doesn't do what you think it does, no matter what you think it does.]]></description>
			<content:encoded><![CDATA[<p><strong>tl;dr</strong>: You can&#8217;t compare NULLs. A nullable primary key is a contradiction in terms. You can&#8217;t join on NULL, so a NULL foreign key refers to nothing, by definition. NULL doesn&#8217;t do what you think it does, no matter what you think it does.</p>

<p><span id="more-226"></span></p>

<hr />

<p>NULL in SQL is annoyingly complex.</p>

<p>There&#8217;s really no conceptual model of NULL that will not end up surprisingly you in unpleasant ways. Jeff Davis, last year, wrote <a href="http://thoughts.j-davis.com/2009/08/02/what-is-the-deal-with-nulls/">a great blog post</a> that, if could be so bold, could be paraphrased as &#8220;conceptual models of NULL considered harmful.&#8221;</p>

<p>Thus, it&#8217;s not surprising that some&#8230; well, surprising ideas about NULL sometimes pop up.</p>

<p>Recently, on the Django developers&#8217; list, the phrase &#8220;nullable primary key&#8221; caught my eye. This inspired me to write these thoughts about NULL, and in particular NULL being used as keys.</p>

<p>First:</p>

<blockquote>
  <p>It ie meaningless to compare two NULL values.</p>
</blockquote>

<p>I&#8217;ve noticed application programmers often treat NULL as a magic value that any type can possess (I&#8217;ve been quite guilty of this, too). While this is somewhat true, it&#8217;s also a dangerous path to go down, because:</p>

<pre><code>NULL = NULL
</code></pre>

<p>&#8230; is NULL, not true. Whatever else you can say about NULL, a NULL value means you can make no claims about what value it is. Saying, &#8220;I have no idea what this value is, and I have no idea about what that value is, but are they equal?&#8221; is, I would hope, pretty self-evidently meaningless.</p>

<p>Now, this immediately implies:</p>

<blockquote>
  <p>You cannot join on NULL.</p>
</blockquote>

<p>If a foreign key column is NULL, you can&#8217;t do an inner join on it to another table, <em>even if key column(s) being referred to is NULL</em>. This follows directly from the fact that you can&#8217;t compare NULL values; joining is just built around comparison, after all.</p>

<p>Yes, you can do things like:</p>

<pre><code>SELECT a.*
    FROM a
    INNER JOIN b
        ON (b.col = a.col) OR ( (b.col IS NULL) AND (a.col IS NULL) )
</code></pre>

<p>Setting aside that you&#8217;ve pretty much committed yourself to a nested loop at this point (and thus a very expensive operation), the fact that you have to jump through this hoop should be an indication that the wrong path is being trod.</p>

<p>So, please remember: NULL in a foreign key field does not mean &#8220;This refers to rows in the other table that have a matching NULL,&#8221; because there&#8217;s no such thing as a &#8220;matching NULL.&#8221;</p>

<p>Moving on to primary keys:</p>

<blockquote>
  <p>A primary key is a combination of columns whose values, taken together, uniquely specify a row.</p>
</blockquote>

<p>Thus, a nullable primary key is equally meaningless, as the whole point of a primary key is for it it to be compared to other values to determine uniqueness. (The SQL standard prohibits NULLs in primary key columns, so it&#8217;s not just a good idea, it&#8217;s the law, or at least the recommendation.)</p>

<hr />

<p>The SQL standard calls for NULL to be thrown up in places where it really should require an error.  For example:</p>

<pre><code>SELECT SUM(col) FROM t WHERE FALSE
</code></pre>

<p>&#8230; returns NULL, as the result of any aggregate function over zero rows is NULL. But the sum of no numbers is 0, not &#8220;unspecified&#8221; (or whatever you want to call NULL).</p>

<p>Worse:</p>

<pre><code>SELECT AVG(col) FROM t WHERE FALSE
</code></pre>

<p>is NULL, while:</p>

<pre><code>SELECT 0/0
</code></pre>

<p>&#8230; much more rationally gives a divsion-by-zero error.</p>

<p>My guess is that the SQL standards committee is loathe to have the spec require errors for more-or-less common operations, and that&#8217;s where a lot of the stranger cases of NULL come from, as a way of having a normal-but-flagged return from an edge case.</p>

<p>It&#8217;s really a shame that NULL is so complex and counterintuitive, but there&#8217;s really no hope for it except to learn the rules, and not try to abuse NULL to do things it wasn&#8217;t designed for.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/17/comparing-nulls-considered-silly/feed/</wfw:commentRss>
		<slash:comments>3</slash:comments>
		</item>
		<item>
		<title>Using Server-Side PostgreSQL Cursors in Django</title>
		<link>http://thebuild.com/blog/2010/12/14/using-server-side-postgresql-cursors-in-django/</link>
		<comments>http://thebuild.com/blog/2010/12/14/using-server-side-postgresql-cursors-in-django/#comments</comments>
		<pubDate>Wed, 15 Dec 2010 00:02:22 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=221</guid>
		<description><![CDATA[A couple more interesting hacks for dealing with very large result sets in Django and PostgreSQL.]]></description>
			<content:encoded><![CDATA[<p>This is a follow-up to the <a href="http://thebuild.com/blog/2010/12/13/very-large-result-sets-in-django-using-postgresql/">previous post</a>, in which we talked about ways of handling huge result sets in Django.</p>

<p>Two commenters (thanks!) pointed out that psycopg2 has built-in support for server-side cursors, using the <code>name</code> option on the <a href="http://initd.org/psycopg/docs/connection.html#connection.cursor">.cursor() function</a>.</p>

<p>To use this in Django requires a couple of small gyrations.</p>

<p>First, Django wraps the actual database connection inside of the <code>django.db.connection</code> object, as property <code>connection</code>. So, to create a named cursor, you need:</p>

<pre><code>cursor = django.db.connection.connection.cursor(name='gigantic_cursor')
</code></pre>

<p>If this is the first call you are making against that connection wrapper object, it&#8217;ll fail; the underlying database connection is created lazily. As a rather hacky solution, you can do this:</p>

<pre><code>from django.db import connection

if connection.connection is None:
    cursor = connection.cursor()
       # This is required to populate the connection object properly

cursor = connection.connection.cursor(name='gigantic_cursor')
</code></pre>

<p>You can then iterate over the results using the standard iterator or <code>cursor.fetchmany()</code> method, and that will grab results in from the server in the appropriate chunks.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/14/using-server-side-postgresql-cursors-in-django/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Very Large Result Sets in Django using PostgreSQL</title>
		<link>http://thebuild.com/blog/2010/12/13/very-large-result-sets-in-django-using-postgresql/</link>
		<comments>http://thebuild.com/blog/2010/12/13/very-large-result-sets-in-django-using-postgresql/#comments</comments>
		<pubDate>Tue, 14 Dec 2010 03:10:17 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[Django]]></category>
		<category><![CDATA[PostgreSQL]]></category>
		<category><![CDATA[Python]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=207</guid>
		<description><![CDATA[Don't use Django to manage queries that have very large result sets. If you must, be sure you understand how to keep memory usage manageable.]]></description>
			<content:encoded><![CDATA[<p><strong>tl;dr:</strong> Don&#8217;t use Django to manage queries that have very large result sets. If you must, be sure you understand how to keep memory usage manageable.</p>

<p><span id="more-207"></span></p>

<hr />

<p>One of the great things about modern interpreted, garbage-collected languages is that most of the memory management happens behind the scenes for you.  Unfortunately, sometimes, the stage equipment comes crashing through the backdrop in the middle of the performance.</p>

<p>In Django, this frequently happens when manipulating tables that contain a very large number of rows.  Here are some tips on how to not end up with the &#8220;behind the scenes&#8221; machinery landing in the audience&#8217;s lap.</p>

<p>For purposes of this discussion, let&#8217;s define &#8220;very large&#8221; as being bigger than is comfortable to keep in memory for the appplication.</p>

<p>When Django executes a query and reads the results, memory is being taken up several places to hold the results of the query:</p>

<ol>
<li><p>On the database server, it needs to keep around structures holding the result of the query.  Most database servers are good about not keeping any more rows of the result in memory than they absolutely have to, and in any event, it&#8217;s pretty much out of Django&#8217;s control what the database server stores.  So, we shall trust the database to do the right thing, and move on.</p></li>
<li><p>Inside of the Django application, some set of the rows that will ultimately be the result of the query need to be stored while Django processes them.</p></li>
<li><p>The Django QuerySet object can (although it does not always) cache some of the results of the query as Model objects.</p></li>
<li><p>And, of course, the application might hang on to some of the objects that come back (for example, for display on the web page). Of course, this is directly under the control of the application author.  We&#8217;ll trust <em>you</em> to do the right thing, and move on.</p></li>
</ol>

<p>First, let&#8217;s talk about Django&#8217;s caching in the QuerySet object.</p>

<h3>Query Set Caching</h3>

<p>Django&#8217;s QuerySet objects serve two roles: They&#8217;re data structures representing an SQL query, and an API to access the results of the query.  There&#8217;s no explicit &#8220;do this query now, please&#8221; operation in Django (although some operations have that as, shall we say, a strong implication); by and large, Django waits until you try to get the results of a query before executing it.  So, until you get the first object out of the query set, Django won&#8217;t have even executed the query.</p>

<p>Django also has a caching mechanism built into QuerySet.  This cache stores the objects that are manufactured from the rows as they come back from the database, so that multiple accesses to the same object <em>from the same QuerySet</em> will return the cached object instead of a new copy.</p>

<p>Note, however, the emphasis on <em>from the same QuerySet</em>.  A surprisingly large number of operations clone the QuerySet before operating on it.  For example:</p>

<pre><code>qs = ExampleModel.objects.filter(name='Fred')
x = qs[2]
x = qs[2]
</code></pre>

<p>This will do two queries.  <code>qs[2]</code>, under the hood, clones the query set, applies a limit of <code>[2:3]</code> to it, executes the query, returns the resulting object, and throws the limited QuerySet away.  Slicing does exactly the same thing.</p>

<p>However, there is an exception.  If you do this:</p>

<pre><code>qs = ExampleModel.objects.filter(name='Fred')
list(qs)
x = qs[2]
x = qs[2]
</code></pre>

<p>&#8230; the access pattern will be very different.  <code>list(qs)</code> forces the evaluation of the query set, so Django will send the query to the database server, and populate the QuerySet (and its cache) with the result.  Then, the <code>qs[2]</code> operaitons don&#8217;t copy the QuertSet; it just hits the cache.</p>

<p>Note, though, that this came at the expense of retrieving every row that matched the query from the server, and creating objects for it.  If you force the QuerySet to be evaluated, Django creates objects for everything that matches the query.</p>

<p>When you iterate over a QuerySet, the behavior is slightly different.  The QuerySet cache is always built from the first object that matches the query on up; it&#8217;s not sparse (for example, you&#8217;ll never have the situation where qs[4] and qs[1000] are in the cache, but the objects between them aren&#8217;t).  As you iterate over a QuerySet, if the cache is not already populated, Django grabs the rows in chunks (currently hard-coded to be 100) and fills the cache ahead of the iterator.  This does mean that if you do a query, then only iterate over the first few elements, the cache doesn&#8217;t fill up with stuff you are never going to look at.</p>

<p>You can defeat the caching by using <code>.iterator()</code>.  For example:</p>

<pre><code>qs = ExampleModel.objects.filter(name='Fred')
for x in qs.iterator():
   do_something_wonderful(x)
</code></pre>

<p>This will execute the query, and return each resulting object back, but without filling the cache.  (It also won&#8217;t return cached objects if they already exist; <code>.iterator()</code> forces a reexecution of the query.)  As the Django documentation says, this can be handy if it is a huge result set.</p>

<p>So, let&#8217;s say you for some reason want to process 100 million rows.  You know for sure that you won&#8217;t be able to hold all 100 million Model objects in memory, so you dutifully do:</p>

<pre><code>qs = GiganticTableModel.objects.all()
for giant in qs.iterator():
    # And BANG, you get an out of memory exception right here.
</code></pre>

<p>But what happened?  Why did you run out of memory before you even saw a single object out of <code>.iterator()</code>?</p>

<h3>Daddy, Where Do Model Objects Come From?</h3>

<p>Let&#8217;s take a moment and trace down the code path that gets executed here:</p>

<ul>
<li><p>Creating the QuerySet doesn&#8217;t touch the database at all, as noted above.</p></li>
<li><p>After a certain amount of fussing around, <code>.iterator()</code> calls the underlying backend query machinery to perform the query.</p></li>
<li><p>The backend machinery executes the query, and creates an iterator over the resulting rows.  That iterator (in this case) grabs a chunk of rows at a time using <code>.fetchmany()</code>, and returns them one at a time.  (At it happens, that chunk is hardcoded at 100 rows.)</p></li>
<li><p>That iterator is called by the actual iterator returned by <code>.iterator()</code>, so the iteration (pfew!) proceeds as: Call to get a row (which refills if the last grab of 100 is exhausted), create a new object, and return it up.  Create an object from that row, return it to the caller.</p></li>
</ul>

<p>So, why are we getting an out of memory condition?  Even though there are 100 million rows in the result, there should only be 100 in memory at any one time, right?</p>

<p>Sadly, wrong. At the moment that the backend machinery executes the query, <em>all 100 million rows are returned by the database server at once.</em></p>

<p>To quote the <a href="http://initd.org/psycopg/docs/usage.html#server-side-cursors">psycopg2 documentation</a>:</p>

<blockquote>
  <p>When a database query is executed, the Psycopg cursor usually fetches all the records returned by the backend, transferring them to the client process. If the query returned an huge amount of data, a proportionally large amount of memory will be allocated by the client.</p>
</blockquote>

<p>This is true even if you do a <code>.fetchone()</code> or <code>.fetchmany()</code>, not just a <code>.fetchall()</code>.  And there&#8217;s no way, while staying entirely within the standard Django QuerySet machinery, to change this behavior.</p>

<p>So, what do we do?</p>

<h3>&#8220;Doctor, It Hurts When I Do That.&#8221;</h3>

<p>&#8220;So, don&#8217;t do that.&#8221;</p>

<p>If at all possible, don&#8217;t process very large result sets directly in Django.  Even setting aside the memory consumption, it&#8217;s a horribly inefficient use of pretty much every part of the toolchain.  Much more appealing options include:</p>

<ol>
<li><p>Use <code>.update()</code> to push the execution into the server.</p></li>
<li><p>Use a stored procedure or raw SQL.</p></li>
</ol>

<p>Modern database servers are designed to crunch large result sets; leave the data on the server and do it there.</p>

<h3>Take Smaller Bites</h3>

<p>If there is a way of partitioning the data up into smaller chunks, do that.  (For example, processing by day, or ID range.)  Although I wouldn&#8217;t exactly call it &#8220;best practice,&#8221; you could iterate through the rows by using ranges of the primary key, assuming a standard Django serial integer PK:</p>

<pre><code>i = 0
while True:
    qs = GiganticTableModel.objects.filter(pk__gte=i*1000, pk__lt=(i+1)*1000)
    try:
        for giant in qs:
            do_something_wonderful(giant)
    except GiganticTableModel.DoesNotExist:
        break

    i += 1
</code></pre>

<p>There&#8217;s also <a href="http://www.mellowmorning.com/2010/03/03/django-query-set-iterator-for-really-large-querysets/">an example here</a> of constructing an iterator that does much the same thing.</p>

<h3>Use a Database-Side Cursor</h3>

<p>The way that databases really deal with this problem is <a href="http://www.postgresql.org/docs/9.0/interactive/sql-declare.html">cursors</a>.  Not the Python DBI <code>cursor</code>, in this case; server-side cursors are a structure which holds the result of a query and allows the client to read portions of it at will without having the whole thing shipped across.</p>

<p>They&#8217;re wonderful, and Django should use them.  It doesn&#8217;t.  However, you can, using direct SQL.</p>

<p>To create a cursor in PostgreSQL in the server, first, we need to have a transaction open.  For the full details about Django transaction management, check out some of my earlier blog posts.  This is required because the type of cursors we&#8217;ll be using will only persist for the duration of the transaction.</p>

<p>Now, the SQL sequence looks something like this.  Instead of saying:</p>

<pre><code>SELECT * FROM app_gigantictablemodel;
</code></pre>

<p>We say:</p>

<pre><code>DECLARE gigantic_cursor BINARY CURSOR FOR SELECT * FROM app_gigantictablemodel;
</code></pre>

<p>(The <code>BINARY</code> keyword allows it to use the more-efficient binary protocol between the database server and the application.)</p>

<p>Then, to get results, we can just say:</p>

<pre><code>FETCH 1000 FROM gigantic_cursor;
</code></pre>

<p>&#8230; or however many rows we want to get.</p>

<p>And then, we can just iterate over them (of course, we&#8217;re getting the rows as rows, rather than objects):</p>

<pre><code>cursor = connection.cursor()
    # Remember that this 'cursor' is a different thing than the server-side cursor!
cursor.execute("DECLARE gigantic_cursor BINARY CURSOR 
                    FOR SELECT * FROM app_gigantictablemodel")

while True:
    cursor.execute("FETCH 1000 FROM gigantic_cursor")
    rows = cursor.fetchall()

    if not rows:
        break

    for row in rows():
        ...
</code></pre>

<p>Now, there&#8217;s something that should work great, but doesn&#8217;t.  In 1.2, Django introduced raw SQL queries that return a RawQuerySet.  So, one could in theory do this:</p>

<pre><code>qs = GiganticTableModel.objects.raw("FETCH 1000 FROM gigantic_cursor")
</code></pre>

<p>Except we get an exception:</p>

<pre><code>Raw queries are limited to SELECT queries. Use connection.cursor directly for other types of queries.
</code></pre>

<p>Presumably to guard against weird errors, raw queries do a hardcoded check against the query string, making sure it starts with SELECT.  It would be nice if this were liberalized to allow FETCHes.</p>

<hr />

<p>So, if you must, you can process gigantic result sets in Django. But, ideally, you should design your application to make it unnecessary.  If you <em>must</em> process large result sets, do it on the database server; that&#8217;s what it&#8217;s there for.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/12/13/very-large-result-sets-in-django-using-postgresql/feed/</wfw:commentRss>
		<slash:comments>9</slash:comments>
		</item>
		<item>
		<title>Waiting for psycopg2.3: NamedTuples</title>
		<link>http://thebuild.com/blog/2010/11/06/waiting-for-psycopg2-3-namedtuples/</link>
		<comments>http://thebuild.com/blog/2010/11/06/waiting-for-psycopg2-3-namedtuples/#comments</comments>
		<pubDate>Sun, 07 Nov 2010 02:04:49 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=203</guid>
		<description><![CDATA[Christmas just came early for me. psycopg2.3, now in beta, includes named tuples as return values from queries.

If you are tired of writing result[4], and would much prefer to write result.column_name, you now can.

Yay!
]]></description>
			<content:encoded><![CDATA[<p><a href="http://initd.org/psycopg/docs/extras.html#namedtuple-cursor">Christmas just came early for me</a>. psycopg2.3, now in beta, includes named tuples as return values from queries.</p>

<p>If you are tired of writing result[4], and would much prefer to write result.column_name, you now can.</p>

<p>Yay!</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/11/06/waiting-for-psycopg2-3-namedtuples/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Small PostgreSQL Installations and 9.0 Replication</title>
		<link>http://thebuild.com/blog/2010/10/28/small-postgresql-installations-and-9-0-replication/</link>
		<comments>http://thebuild.com/blog/2010/10/28/small-postgresql-installations-and-9-0-replication/#comments</comments>
		<pubDate>Thu, 28 Oct 2010 18:24:30 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=198</guid>
		<description><![CDATA[<strong>tl;dr:</strong> PG 9.0's streaming replication will be widely adopted by smaller installations that use PG to manage business-critical data, specifically because it makes it something a casual DBA can do, something we've not had before with PG.]]></description>
			<content:encoded><![CDATA[<p>Yesterday, <a href="http://thebuild.com/blog/2010/10/27/users-want-functionality-not-features/">I commented on a post</a> about how widespread uptake on 9.0 replication will be. I disagreed with the assessment that &#8220;users&#8221; (by which we mean small installations of PostgreSQL, defined however you care to) will not be interested in 9.0&#8217;s hot standby/streaming replication.</p>

<p>Ultimately, of course, we&#8217;ll find out. But I strongly feel that 9.0&#8217;s streaming replication will be a big deal for small PostgreSQL installations&#8230; indeed, I think it will be a much bigger deal for them than big ones.</p>

<p>First, I&#8217;ll happily exclude hobbyist and developer installs of PostgreSQL. I don&#8217;t back up my development PG databases more often than once a day, and I certainly don&#8217;t have any kind of replication set up for them (unless that&#8217;s what I&#8217;m developing). The important part, the code, lives in a DVCS, and if I had to reconstruct the db from scratch, no big deal&#8230; indeed, I do it all the time.</p>

<p>I&#8217;m talking about small installations of PG that are used to as authoritative records of business-critical information: Web site transactions, for example. The fact that, traditionally, these users of PG haven&#8217;t been all that into replication solutions has nothing to do with their actual <em>need</em> for replication; instead, it has to do with the solutions they had available.</p>

<ul>
<li>Small installations generally don&#8217;t have the time and expertise to search out third-party solutions, or the budget to pay an expert to do so. If it doesn&#8217;t come in the base RPM or tarball, they&#8217;re not interested in it.</li>
<li>The third-party solutions that are available are all complex and fiddly to set up. I&#8217;m certainly <em>not</em> bashing Slony, for example; it&#8217;s a great tool. But it is not something that a casual DBA wants to take on.</li>
</ul>

<p>So, they make do with <code>pg_dumpall</code> and hope for the best&#8230; and then call someone <a href="http://pgexperts.com">like us</a> if that doesn&#8217;t work.</p>

<p>But it is fallacious to conclude that because they are not using replication right now, they have no use for it. Ask a corner liquor store if they could afford to have an entire day&#8217;s worth of electronic transactions just vanish; I&#8217;ll bet a bottle of something cheap that they carry that the answer would be, &#8220;Of course not.&#8221; It might not be worth a $15,000 consulting engagement to set it up, but it&#8217;s worth something, possibly quite a bit.</p>

<p>Indeed, this is one of the things that&#8217;s driving adoption of &#8220;cloud computing&#8221;: The (sometimes erroneous) idea that the cloud provider is managing disaster recovery and high availability for you, included in the cost of your monthly service charge.</p>

<p><strong>tl;dr:</strong> PG 9.0&#8217;s streaming replication will be widely adopted by smaller installations that use PG to manage business-critical data, specifically because it makes it something a casual DBA can do, something we&#8217;ve not had before with PG.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/10/28/small-postgresql-installations-and-9-0-replication/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>Users Want Functionality, Not Features</title>
		<link>http://thebuild.com/blog/2010/10/27/users-want-functionality-not-features/</link>
		<comments>http://thebuild.com/blog/2010/10/27/users-want-functionality-not-features/#comments</comments>
		<pubDate>Thu, 28 Oct 2010 04:38:20 +0000</pubDate>
		<dc:creator>Xof</dc:creator>
				<category><![CDATA[PostgreSQL]]></category>

		<guid isPermaLink="false">http://thebuild.com/blog/?p=195</guid>
		<description><![CDATA[Over at the Command Prompt blog, Joshua Drake makes a (probably deliberately) provocative point about &#8220;users&#8221; not wanting replication, as opposed to &#8220;customers&#8221; who do. I&#8217;ll confess I&#8217;m not 100% sure about his distinction between &#8220;users&#8221; and &#8220;customers,&#8221; so I&#8217;ll just make something up: Users are the people sitting in front of the application, entering [...]]]></description>
			<content:encoded><![CDATA[<p>Over at the Command Prompt blog, <a href="http://www.commandprompt.com/blogs/joshua_drake/2010/10/users_versus_customers_-_you_dont_need_no_stinking_replication/">Joshua Drake makes a (probably deliberately) provocative point</a> about &#8220;users&#8221; not wanting replication, as opposed to &#8220;customers&#8221; who do. I&#8217;ll confess I&#8217;m not 100% sure about his distinction between &#8220;users&#8221; and &#8220;customers,&#8221; so I&#8217;ll just make something up: <em>Users</em> are the people sitting in front of the application, entering data, buying shoes, or doing whatever it is that the database enables; <em>customers</em> are the CIOs, CTOs, Directors of Engineering, and the other people who make purchasing decisions.</p>

<p>He writes:</p>

<blockquote>
  <p>Yes, <a href="http://www.commandprompt.com/">Command Prompt</a> customers want replication. Yes, <a href="http://pgexperts.com/">PostgreSQL Experts</a>, <a href="http://www.enterprisedb.com/">EntepriseDB</a> and <a href="http://omniti.com/">OmniTI</a> customers want replication. However, customers are <em>not</em> users. At least not in the community sense and the users in the community, the far majority of them do not need or want replication. A daily backup is more than enough for them.</p>
</blockquote>

<p>Well, yes, as far as it goes, he&#8217;s absolutely right. Users don&#8217;t need or want replication. They don&#8217;t need or want PostgreSQL, for that matter; VSAM, flat files, or a magic hamster would be fine with them, too, as long as the data that comes out is the data that goes in.</p>

<p>But for how many users, really, is &#8220;It&#8217;s OK if you lose today&#8217;s data, gone, irretrievably, <em>pffft</em>, yes?&#8221; really an acceptable answer? Very few. Very very few, and getting fewer all the time. One of the strongest pushes behind moving services into the &#8220;cloud&#8221; (i.e., external hosting providers of various kinds) is that they provide near-constant recovery and fault-tolerance. Users don&#8217;t care if their data is protected by hardware-level solutions like SANs, or software-level solutions like replication, as long as it <em>is</em> protected.</p>

<p>Users who profess not to care about this are either not putting authoritative data into a database, or just haven&#8217;t had the inevitable data disaster happen to them yet.</p>

<p>For me, the biggest feature of PostgreSQL&#8217;s 9.0 replication is that it is much, much easier to set up than any previous solution. Slony is a heroic project, and has lots of happy customers using it extensively, but it is notoriously fiddly and complex to set up.</p>

<p>Like a lot of technologies, replication hasn&#8217;t been a demand for a lot of PostgreSQL implementation because the cost didn&#8217;t seem worth the payoff. 9.0 brings the implementation cost way, way down, and thus, we&#8217;ll start seeing a lot more interest in putting replication in.</p>

<p>Of course, do the daily backups, too.</p>
]]></content:encoded>
			<wfw:commentRss>http://thebuild.com/blog/2010/10/27/users-want-functionality-not-features/feed/</wfw:commentRss>
		<slash:comments>7</slash:comments>
		</item>
	</channel>
</rss>

