The Build

19:30

Of Pickups and Tractor-Trailers

25 April 2012

Pickup trucks are great.

No, really. They are great vehicles. You can use them for all sorts of really useful things: Bringing your tools out to a construction gig. Delivering refrigerators. Helping your friend move a sofa. Carting away a reasonable amount of construction debris.

But if you need to deliver 75,000 pounds of steel beams to a construction site, in a single run? A pickup truck will not do it. Not even a big pickup. Not even if you add a new engine. Not even if you are willing to get three pickups. You need equipment designed for that. (And, as a note, the equipment that could handle delivering the steel beams would be a terrible choice for helping a friend move their sofa.)

“But,” I hear you say, “I already know how to drive a pickup! And we have a parking space for it. Can’t we just use the pickup? You’re a truck expert; tell us how to get our pickup to pull that load!”

And I say, “Being a truck expert, I will tell you again, a pickup the wrong kind of truck. There are other trucks that will handle that load with no trouble, but a pickup isn’t one of them. The fact that you have a pickup doesn’t make it the right truck.”

We have many clients that run PostgreSQL, happily, on Amazon Web Services.

Some clients, however, are not happy. They are attempting to haul tractor-trailer loads (such as high volume data warehouses) using pickup trucks (Amazon EC2 instances). They wish us to fix their problem, but are not willing to move off of Amazon in order to get the problem fixed.

I like AWS for a lot of things; it has many virtues, which I will discuss in detail soon. However, AWS is not the right solution for every problem. In particular, if you require a high read or write data rate in order to get the performance you need from your database, you will ultimately not be happy on AWS. AWS has a single block-device storage mechanism, Elastic Block Storage, which simply does not scale up to very high data rates.

That doesn’t mean that AWS is useless, it just means it isn’t the right tool for every job. The problem arises when AWS is considered the fixed point, like the pickup was the fixed point above. At some point, you have to decide:

  1. That being on AWS is so important (for whatever reason) that you are willing to sacrifice the performance you want; or,
  2. The performance you want is so important that you will need to move off of AWS.

Sadly, even the best of consultants do not have the magic engine in our back room that will cause EBS to perform as well as high-speed direct attached storage.

More soon.

Randy at 23:24, 25 April 2012:

How much would switching from EBS to ephemeral store alleviate this issue?

This is, of course, assuming the client was okay with using continuous archiving to backup in the event the ephemeral store disappeared.

ju at 23:48, 25 April 2012:

Can you recommend any other cloud platform that can replace AWS and doesn’t have this problem?

Andrew Dunstan at 07:03, 26 April 2012:

@ju: at some stage you need to be able to get intimate with the hardware. When that happens no cloud provider will do, because the essence of the cloud is that the hardware has been abstracted away from you.

Xof at 08:56, 26 April 2012:

@Randy – Yes, running the database on ephemeral storage will be somewhat faster than running it on EBS. (I don’t have hard numbers, but those can be gotten.) Of course, as you note, you have to be prepared for a reboot to cost you your database.

@ju – This deserves a full blog post, but you have to be careful about what you mean by a “cloud platform.” For me, a “cloud provider” is one that provides virtual hosts with rapid provisioning. There’s an inherent tension between that and performance, since the more highly managed and virtualized the hosts are, the farther away they are from the underlying hardware. Amazon has their own special concerns in this regard, but please don’t take from my post that “Amazon is bad, and someone else is good.” All cloud hosting providers have trouble when high data rate I/O is required; it’s the current nature of the technology.

Robert Young at 12:11, 26 April 2012:

— the essence of the cloud is that the hardware has been abstracted away from you

When cloud was first promoted, the meme was that “your” cloud would/could be provisioned as you saw fit; it would be just what you’d put in your own glass room, but cheaper to build, and much cheaper and instantly available for expansion. Never believed that, for a host of reasons. You get what you pay for, sometimes.

When Seagate reported its quarterly, one of the bullet points was that cloud demand for cheap HDD was much of the reason. If you want your database on a 16 core Xeon with F-io SSDs, will any of these cloud providers do that? At a cost materially less than you could? And when you need more SSD, will they show up *right now*? And so on. IBM ran its Service Bureau decades ago; the original cloud. They finally dumped it.

Joe Van Dyk at 10:20, 27 April 2012:

Ephermesl storage is faster and more consistent than EBS. Ephemeral storage does stick around if you reboot the instance, but not of the instance dies or is terminated. But that’s what streaming replication is for.

I’d love to see some amazon ami’s for setting up pg with replication and auto-failover. Maybe when PostgreSQL-XC comes out.

Amit Kapila at 22:54, 29 April 2012:

How about S3 storage service of Amazon. I have read it is integrated with SSD’s and very good for handling high volume/rate requests for databases

Xof at 14:16, 2 May 2012:

Joe – There’s really no reason you can’t built that now on Amazon’s existing APIs.

Amit – S3 isn’t a block-storage mechanism; it’s intended for the storage of large objects over file-based APIs.

Peter van Hardenberg at 11:48, 3 May 2012:

Could you define “high volume” based on your experience, Christophe?

I personally run a database with hundreds of millions of rows and many millions of transactions per day on Amazon, and our customers run many more besides.

Certainly I am aware that specifically selected native hardware has desirable performance characteristics, but I don’t believe it is correct to say AWS instances can’t handle large query volumes. I can certainly conceive of situations where improved per-node performance would be of benefit, but in my experience systems which are scaling non-linearly benefit very little overall from running on a machine with better performance. Inevitably the only long-term solution is to scale out to a multi-node architecture.

In fact, one could argue that the ease with which one can distribute load to additional systems on Amazon’s infrastructure makes it eminently more scalable than native hardware; it is only that as a system administrator you are simply required to use different skills and methodologies to achieve it.

That said, I appreciate that the techniques you have seen success with in the past may not be suitable in a cloud environment. All in all, I look forward to your upcoming talk and learning from the experiences you’ve had.

Xof at 12:06, 3 May 2012:

You’ll note I do not use the term “query volume” in my post, because that’s not my point. I can run a database with hundreds of millions of rows and millions of transactions a day on my laptop; indeed, I’m doing it right now. The performance of that is acceptable for what I need; it might not be for others.

“Scalability” is not a single monotone characteristic. AWS generally succeeds whether the problem can be divided into pieces (or consists of a single piece) where throwing more CPU at it will be of benefit. That’s a large class of problem! However, there are other classes of problem where it cannot be easily divided into pieces, or the number of pieces required to overcome AWS’ limitations (in particular, in block storage I/O) make it uneconomical to do so.

I understand that it would be convenient for every modern database problem to be recast as a nail for the benefit of the hammer vendors; however, some still stubbornly remain screws.

Peter van Hardenberg at 13:04, 3 May 2012:

My apologies for misrepresenting you, Christophe. My question was quite sincere. I would love to know at what point you would consider a workload to have a “high” read/write data rate, particularly at what point you would say it became unsuitable for running on AWS.

Our experience has been that you can generally go farther than one would expect, and that in the cases where a single node begins to fall over, users mostly have One Big Table that ultimately winds up being sharded across several nodes. This is what I think of as “the Instagram maneuver” and with dblink and FDWs doesn’t even require you to give up on querying within the database.

YMMV. :)

Xof at 13:27, 3 May 2012:

A workload becomes unsuitable for AWS when it reaches the limits of AWS’ performance. I realize that’s a bit of a tautology, but there are simply too many factors involved to say something, “If you do more than 2k writes per second, move off of Amazon.”

Things that tend to indicate that Amazon is the right solution:

1. The application has a natural and convenient sharding point, without requiring a large number of cross-shard queries during normal operation.
2. The application tends to do a lot more reads than writes.
3. The database either fits entirely in memory, or can be comfortably sharded (see #1) to do so.
4. The overall cost of this configuration is manageable compared to dedicated hardware.

Things that tend to indicate that Amazon is the wrong solution:

1. The database is very large and does not comfortably fit in memory.
2. The database is very write-intensive.
3. The application’s algorithms are not easy to redo as a map-reduce problem.
4. The cost (especially for I/O) starts becoming excessive compared to dedicated hardware.

In a very broad brush, OLTP applications which can be cleanly sharded tend to do well on AWS; large data warehouses tend to do poorly, as do OLTP applications which cannot be cleanly sharded and which have high write rates. In one case, a client was paying enough each month to Amazon *just for I/O* that they could:

(a) Buy hardware with considerable headroom;
(b) Host it;
(c) Hire a full-time DBA to manage it.

It’s important to remember that as interesting as Instagram’s architecture is, not every use of a database is an Instagram.

Sara at 16:46, 28 May 2012:

I wonder if Amazon is using an updaegrd version of EBS / Ephemeral stores to provide higher IOPS for the RDS disk subsystem? Perhaps allocating higher throughput on the giant SAN so snapshots etc all work seamlessly. I’d like to see greater than the 72 IOPS they suggest EBS provides as an option. Amazon could allow selection of IOPS as an alternate parameter during EBS allocation. They could offer EBS stores that are 2x, 4x, 8x, Nx as fast placing the volume on more spindles in the SAN environment and thus giving the higher IOPS needed for high write or very large data warehouse style databases.Alas, a boy can dream can’t he?

Xof at 17:07, 28 May 2012:

Amazon could do more (I assume) to make performance more reliable, but as long as EBS is an gigabit-Ethernet-based SAN, we’re looking at a maximum throughoput of 125MB/s. They’d need to move to a faster infrastructure in order to speed that up, and I don’t see signs of that happening soon.