08:18
JSON vs hstore: Which will get you into a cool bar in the Mission?
3 July 2013
After all, isn’t that the most important issue?
Craig Kerstiens has a good overview of what the two types are. I wanted to give a quick and dirty flowchart as to which one I would use in a green-field development situation.
First, hstore performs better than JSON in nearly every situation, but performance is (as I wrote on Twitter) just one color in a rainbow of annoying decision points you have to make when building a system.
So, what would I do?
- If I am accepting outside data as JSON and want to store it in its hierarchic form, it would require a very clear performance case for me to want to take the time to convert it to hstore.
- If I am generating the data internally (rather than accepting it in its near-complete form), and the hierarchy is not deep, I’d use hstore for its faster performance.
- If retaining the types of the values is essential, JSON is the choice, since hstore stores everything as a string.
- If fully-arbitrary querying on key/value pairs is essential, hstore’s current indexing situation is far superior to JSON’s.
- If you only query on a very small number (1-3) of keys at the top level, it’s probably better to build b-tree indexes over that key’s value than use GiST or GIN indexes; hstore does not have a significant advantage over JSON then.
As JSON support in PostgreSQL matures, this calculus will change, perhaps significantly. But that’s the 9.2 situation, in my view.
There are 8 comments.
denpanosekai at 08:17, 4 July 2013:
This is exactly how I choose between JSON and hstore myself. I use JSON for external application data that doesn’t need to be interpreted by say, an SQL function or a trigger. Basically storage-only data. hstore for everything else.
I must say however it took me a while to understand hstore indexes, in fact I’m still not quite sure I get it. An in-depth article (covering more use cases than the official Postgres documentation and demonstrated performance gain) would be appreciated.
Mike at 16:05, 4 July 2013:
I suspect you mean B* trees?
Xof at 23:12, 4 July 2013:
Well, that changes everything!
Aaron at 16:04, 24 September 2013:
Have any of the performance comparisons changed significantly with 9.3’s improved JSON support?
Xof at 19:03, 24 September 2013:
I haven’t redone the measurements, so I don’t have any new data yet.
Aaron at 10:47, 2 October 2013:
OK, thanks. After reading your (excellent) slides from http://thebuild.com/presentations/pg-as-nosql-pgday-fosdem-2013.pdf, the main thing I’m wondering is if 9.3’s built-in ->> operator would improve the full table scan numbers (slide 52) vs. the plv8 function that you used.
For data sets where you are mostly querying on json fields that are (expression) indexed, seems like json could actually be more performant than hstore. But I don’t have enough experience with either to put together a reliable test.
Thanks for the thought-provoking slides!
martin at 05:35, 15 November 2013:
Thanks for this article!
Anybody knows what’s the difference in used disk space?
And how is it with pasring speed when outputing?
(parsing in case of JSON as it’s saved as text, don’t know how hstore is implemented)
Let’s say I have 1M of records in hstore/JSON and I want to get just one key/value from all of them.
Xof at 01:25, 1 December 2013:
I believe the disk space is due to MongoDB’s more aggressive allocation of disk space; I suspect that as the data set size grows, the difference will become lower and lower.
I’m running new tests against 9.3.x and its nice collection of JSON operators, and doing that kind of extraction is one of the tests I’ll report back on.