Don’t Assume PostgreSQL is Slow

You can’t build a real-life system without caching.

That being said, it’s often the case that parts of the system you think are going to be slow aren’t. I’ve noticed a tendency to build out a huge stack of components (“we’ll have PostgreSQL, and Redis, and Celery, and Varnish, and…”) without actually measuring where the bottlenecks are.

Example: A counter.

Suppose you need a global counter for something. It needs to be super-fast, and available across all of the web front ends. It’s not transactional (you never “uncount” based on a rollback), but you do want it to be persistent.

Option 1: Drop Redis into the stack, use INCR to keep the counter, and have some other process that reads the counter and spills it into PostgreSQL, then have some other process that picks up the count when Redis starts and initializes it (or be smart enough to read from both places and add them when yo need it), and accept that there are windows in which you might use counts.

Option 2: Use SERIAL in PostgreSQL.

But option 2 is really really really slow compared to super-ultra-fast Redis, right?

Not really (test on an Amazon i2-2xlarge instance, client over local sockets, Python client):

Ten million INCRs in Redis: 824 seconds.
Ten million SELECT nextval('') in PostgreSQL: 892 seconds.

So: Slower. 6.8 microseconds per increment slower. And no elaborate Redis tooling.

So, build for operation, apply load, then decide what to cache. Human intuition about what might be slow is almost certainly wrong.

Comments

Pedro Padron · 2015-10-30

When building our email marketing app I chose PostgreSQL but also Redis to store event statistics because I thought "all those joins and counts would be terrible". I didn't do my homework estimating/testing the workload, after all "with Redis everything is in memory!!1!". I also underestimated the overhead of keeping data in sync between the two systems. Now, I just finished migrating most of what I kept in Redis back to PostgreSQL with plain old views and some not-that-complex queries. Materialized views are even better for data that doesn't need to be real-realtime. Redis is great, but my judgement was not. Thanks for the post!

Charles Feduke · 2015-10-30

This is a pretty clever use case. We've just implemented the Redis solution (where we build out initial values from stores like PGSQL and C*) after having a terrible experience with C* counter columns [in fairness, we were warned]. But because we don't need *a* global counter, instead millions of them, this solution won't scale. http://postgresql.nabble.com/Maximum-number-of-sequences-that-can-be-created-td5702746.html I'll keep it in mind for the places where it would work though.

wbsgrepit · 2015-10-30

>So: Slower. 6.8 microseconds per increment slower. And no elaborate Redis tooling. >So, build for operation, apply load, then decide what to cache. Human intuition about what might be slow is almost certainly wrong. So, benchmarking is complicated and done poorly can reaffirm Human intuition. Might I suggest doing the same type of benchmark on a production config (ha/replication) of both with a standard production load on pg and see where this falls to. Looking at both performance numbers and side effects with vacs + replication + read/write ratios etc. It seems irresponsible to position pg as a great hot write path vs a system that is designed to do so avoiding most all of the issues with doing it on databases like pg.

me · 2015-10-30

Can you please share code, configuration and DB versions for both tests

Xof · 2015-10-30

It seems irresponsible to position pg as a great hot write path vs a system that is designed to do so avoiding most all of the issues with doing it on databases like pg.

If you have contrary information, please do feel free to provide it. It seems to me you are saying that your intuition disagrees with my results.

Xof · 2015-10-30

Can you please share code, configuration and DB versions for both tests

It's PostgreSQL 9.4.4 in a pretty basic normal tuning, plus whatever version of Redis is the current sudo apt-get redis-server on Ubuntu. The code just did SELECT nextval('') and r.incr('') in a tight loop.

me · 2015-10-30

I am not saying I disagree or agree, when I usually see comparison benchmarks, these come with actual code and environment setup anybody can easily reproduce

Cristian Bica · 2015-10-30

Interesting. I've created a PoC incr/decr implementation in plpgsql if anyone want to try https://gist.github.com/cristianbica/37692bb7794a4a32056a

me · 2015-10-31

since you did not provide actual benchmark code, let me make it up the redis test... oh wait, the benchmark already exists 2014 Macbook Pro OSX 10.11.1, Redis 3.0.4 redis-benchmark -n 10000000 -c 50 -t INCR ====== INCR ====== 10000000 requests completed in 134.82 seconds 50 parallel clients 3 bytes payload keep alive: 1

Xof · 2015-10-31

Good to get that off your chest? I think there is some serious point-missing going on here. In an absolute sense, I'm sure Redis is faster. But it is faster enough to justify having another bit of complexity in your architecture? That's up to you, but it's unwise to automatically assume that PostgreSQL simply can't do the job.

me · 2015-10-31

there is no question that PG can do counters and is fast enough for many use cases, but simply coming to conclusion that pg as fast as redis for counters is in my opinion incomplete (not inaccurate) test at best

Xof · 2015-10-31

> pg as fast as redis for counters That was not my hypothesis.

Tom Harrison · 2015-11-03

I think comments have indeed missed the point. It's not really about pg vs redis, right? It's about choosing the simplest solution first, and extending as needed based on measured results. Complexity is the enemy, not just of performance but of stability, maintenance, cost, etc. Adding another component that seems just right for the job is warranted only when the need has been determined through measurement, and costs (internal and external) have been calculated.

Alexandre de Oliveira · 2015-11-06

Tom Harrison is right. It's not about pg vs redis, it's about "Make it work, make it right, make it fast". It's about the trade offs that need to be made early on. This happens a lot in startups that are optimizing for growth but haven't launched yet, therefore incurring in a lot of additional architectural complexity. This is not about tuning databases.

Related

Comments