Pivoting data is a useful technique in reporting, allowing you to present data in columns that is stored as rows. Assuming you’re using a relational database, you can construct such queries using the SQL Server PIVOT operator or Postgres crosstab function. However, these queries are limited in that all pivot columns must be explicitly defined in the query.
In the relational model, rows give you flexibility while columns are static. Ideally, we could use the data-driven flexibility of rows to define our pivot table dynamically. The static typing of SQL generally makes this challenging, but the JSON datatype and related functions available natively in PostgreSQL provide a compelling solution. We can use the strong declarative syntax of SQL to dynamically create JSON objects with properties derived from the rows in our database.
Consider the following schema for storing book sales:
Populate it with data for our queries, and we’re ready to play around with pivots.
First, let’s look at the traditional
crosstab method from PostgreSQL’s tablefunc module.
This query will return data that looks something like this:
The details of exactly what’s going on here are not intuitive. Essentially the
crosstab function is matching the 2nd column of the first query to the 1st column of the second and aggregating values into the appropriate ‘year’ and ‘month’ bins behind the scenes. It has the advantage of being fairly compact, but the logic is extremely opaque and it feels a bit like magic. What’s possible in a
crosstab query is limited to input queries formatted in a very specific way, and all output columns have to be expressly defined in advance. As the Postgres docs say:
The crosstab function is declared to return setof record, so the actual names and types of the output columns must be defined in the FROM clause of the calling SELECT statement.
Let’s see if we can improve this. PostgreSQL added a native JSON datatype back in version 9.2 and as of 9.5 has extended the support with enhanced functions, operators, and better performance. Using this functionality provides a great solution that is a bit more readable and also completely dynamic:
Let’s unpack this a bit. First, our month_total CTE looks very similar to the first query in the
crosstab example. Here we are aggregating the total sales for each month. This CTE returns a set with one row per month (e.g. 2014 January, 2014 February, etc) and the total sales for that month. We then use the
jsonb_object_agg function to dynamically build JSON objects for each year where the keys are months and the values are sales totals. The output should look like this:
Extending this Technique
Let’s explore how simple this makes it to adjust which slice of our little OLAP cube we’re viewing:
This query will return sales data grouped by state on the vertical axis and genre on the horizontal. From here you can imagine slicing the data any way you want with relatively minimal changes.
This same technique can also be used when properties of a particular entity are not explicitly defined as columns, but the data is stored as rows (imagine a key value store for item properties or a tagging system, etc). In these cases, the dynamic properties can define the fields returned in the output just as the reporting dimensions do in these examples.