postgresql when it's not your job

13:34

ORMs and Their Discontents: PGXPUG Day OSCON 2010 Presentation

18 July 2010

Here are the slides for my talk at PGXPUG Day OSCON 2010.

Tweets that mention ORMs and Their Discontents: PGXPUG Day OSCON 2010 Presentation -- Topsy.com at 14:24, 18 July 2010:

[…] This post was mentioned on Twitter by Planet PostgreSQL and SQLManager, Zuissi. Zuissi said: PostgreSQL: Christophe Pettus: ORMs and Their Discontents: PGXPUG Day OSCON 2010 Presentation: Here are the slides… http://bit.ly/aXdR3H […]

Michael Schuerig at 15:10, 18 July 2010:

What, then, are the problems with ORMs? More specifically, what problems do you see with the concept of OR-mapping in general that can’t be overcome in principle?

From your slides, it seems that the discontent is mostly caused by performance issues resulting from naive use of ORMs. Well, performance (in the throughput sense) isn’t important for every application. Consider in-house apps, for instance. Also, ORMs don’t condemn you to poor performance. Is there any ORM that does not support eager loading of associated objects?

I’m most familiar with Rails’ ActiveRecord–arguably not the brightest of the bunch. Still, for most DB accesses, I would not write any better code, even if I did it by hand. To the contrary, writing SQL manually, I might even become negligent and write code vulnerable to SQL injections or I might just get tired of writing involved joins myself and do separate queries instead.

For cases where the ORM doesn’t do what I want, I just write the suitable SQL myself. No big deal, really. To use the metaphor from your slides: Yes, I’m quite happy that I can still walk, even though I have a (metaphorical) car. Do you see any problem here? I don’t and I don’t mind that the abstraction is leaky as long as it keeps me afloat.

One criticism I’ve read on occasion, though not on your slides, is that ORMs load too much data from the DB. Every column of the tables involved in a query are selected and retrieved. I see that this is less than optimal, but I don’t see how you’d work around it without an ORM. I take it for granted that ORM detractors want to keep a strong separation of model and view code, too. Given this, in order to be general, the model layer probably has to load more data than strictly necessary, even if no ORM is used.

In any case, if you have a table where some columns are only accessed rarely, you might have a good case for splitting the table beyond the call of normalization.

All in all, I’m really curious what a consistent architecture without the discontents would look like.

Laurence Rowe at 15:58, 18 July 2010:

“So, how do you solve the problem?”

I use SQLAlchemy because it seems to just get it right. It doesn’t just target the lowest common denominator, but does seem to degrade gracefully. And It lets you progressively customise behaviour, so no drive/walk dichotomy.

For painless transactions, I wrote “zope.sqlalchemy“ to integrate with “transaction“, which for wsgi means“repoze.tm2“. This gives you Zope’s ‘a request is a transaction’ model. For those after a framework, Turbogears 2 builds on top of these or you can use it with repoze.bfg.

To be honest, I really dislike the database server as app server model where you end up with a large number of stored procedures, using updatable views when abstraction is needed seems much cleaner to me. While I use Postgres by choice, there’s often a bunch of existing data you need to access spread across Oracle, MySQL or SQL Server. And sometimes you have to tactically port from one to another chasing resources or fitting in with whatever today’s corporate policy is.

RobW at 22:17, 18 July 2010:

One problem an ORM will never solve is when the *data* is of central importance and the requirement is to abstract away the applications and programming languages, not the other way around where programmers assume the programming application is sacred and the database is a glorified bit bucket for storing objects.

The most grievous sin in these sorts of discussions (but not necessarily this particular article) is often the implied assumption that everything ever done with computers must involve web apps. In those cases, it’s fine if you have the luxury to assume all interaction begins and ends with the application. But for many use cases where the data is what is sacred and it may be accessed by a smorgasbord of technologies and companies, it is best to let the database management system do what it does best: manage the data. Because ORM’s are not language neutral or equally appropriate for all use cases, it’s not a good idea to rely on them for the essentials of keeping the database consistent or CRUD. Other technologies and companies do not (and often cannot) use that ORM, so what you’re left with is an exposed database and/or inconsistent means of accessing it.

Robert Young at 08:56, 19 July 2010:

@RobW:
One problem an ORM will never solve is when the *data* is of central importance and the requirement is to abstract away the applications and programming languages, not the other way around where programmers assume the programming application is sacred and the database is a glorified bit bucket for storing objects.

And, just for grins, when is this *not* true?? In a Real World, that is, not just the fantasy world of “single application programmers”? Nobody cares about the code (except the parochial coders who made it), they care about the data.

The fundamental problem with ORM’s is the assertion that this thing called Impedance Mismatch is real. It isn’t; for any legitimate OO language (the only venue where IM is asserted to exist; think about that for a minute), object instantiation involves *only* the setting of instance level data, after the first instance. When the first instance is created, not only is its data written to the (usually) instance’s data heap, but the shared method text is written *for the one and only time* of the existence of all instances. In other words, to beat this dying nag, data and only data distinguishes one instance from another.

Given that fact, why use any other data model (and the relational model is the only one which is not merely a post-hoc definition) than that of the RDBMS? The fully normalized (BCNF) database offers the smallest byte footprint; data exists as one fact, one place, one time. Now, flatfile zealots carp about “normal data is too slow”, assuming that joins are the culprit. With the tidalwave of SSDs, both retail and enterprise, these days, along with multi-core/processor machines, there’s no excuse. BCNF data is smaller and faster on such machines.

For those who assert that IMS/xml/hierarchical data is “better”, well, just say NO. You’ve lost relations, and locked yourself into the structure against which you’ve coded. Any modification requires not only changing the data structure, but also all code which manipulates your xml. With intelligent SQL/Relational databases, any added data can be ignored if it’s it not germane to existing code. A fact which SELECT * from Foo destroys, naturally.

Time to do what Dr. Codd demonstrated. Technology has finally caught up with the maths.

fernan at 06:52, 21 July 2010:

Excellent presentation, and fun to read too. Would really love to hear/see the complete presentation …

ORMs and Their Discontents « Open Enterprise: The PostgreSQL Open Source Database Blog from EnterpriseDB at 15:15, 4 August 2010:

[…] a hot topic of discussion among database geeks. At OSCON’s PGDay this year, Christophe Pettus presented the clearest explanation I have ever heard of the complexities of ORMs and the basically […]

Zeno Davatz at 05:51, 20 October 2010:

Thank you for the slides, very interesting!