I am a big proponent of using the right tool for the job. And as far as databases are concerned, I believe that in most cases a relational database is still the correct tool. However, one historical drawback of relational databases has been a difficulty in retrieving hierarchical data in a way that is easily compatible with most programming languages. Fortunately, support for modern data types such as JSON and advanced features including recursive common table expressions (CTEs) provide some excellent ways to tackle this problem.
My relational database of choice is PostgreSQL, which is open source and feature rich. It is fantastic, and you should definitely be using it. Postgres version 9.3 was recently released with even better support for the JSON data type among its new features. Here are some methods I routinely use to retrieve structured, hierarchical data using raw SQL in a single query.
Nested Objects
When working with data in your backend code, you often need quick access to the relationships within your data model. Including nested JSON objects in your result set is a great way to accomplish this naturally. This approach works particularly well if you’re interfacing with the database using node.js or another platform with fast JSON decoding.
Imagine the following hypothetical database of projects managed by employees:
CREATE TABLE employee (
employee_id INT PRIMARY KEY,
name TEXT NOT NULL
);
CREATE TABLE project (
project_id INT PRIMARY KEY,
employee_id INT NOT NULL REFERENCES employee(employee_id),
name text NOT NULL
);With some data:
employee
--------------------------------
employee_id | name
-------------+------------------
1 | Jon Snow
2 | Thoren Smallwood
3 | Samwell Tarley
project
-----------------------------------------------------------
project_id | employee_id | name
------------+-------------+--------------------------------
1 | 1 | Infiltrate Mance Rayder's Camp
2 | 3 | Research the WightsSuppose we wanted to return a list of projects and the employees responsible for them as an array of nested objects. PostgreSQL provides some useful JSON functions for manipulating data, and we will use the row_to_json function to nest objects directly in the query results. row_to_json provides the ability to turn a database row into a json object, which is the key. Consider the following query:
SELECT
p.*,
row_to_json(e.*) as employee
FROM project p
INNER JOIN employee e USING(employee_id)If we’re using the pg node.js library to interface with Postgres, this query will return the following directly to our Javascript:
[
{
"project_id": 1,
"employee_id": 1,
"name": "Infiltrate Mance Rayder's Camp",
"employee": {
"employee_id": 1,
"name": "Jon Snow"
}
},
{
"project_id": 2,
"employee_id": 3,
"name": "Research the Wights",
"employee": {
"employee_id": 3,
"name": "Samwell Tarley"
}
}
]Exactly what we’re looking for. It is very natural to work with this kind of data structure in Javascript or any other imperative language.
Augmenting Nested Objects
Sometimes it is necessary to return additional fields along with a given object that may not be directly included in the database table (due to normalization or other factors). For example, perhaps we are storing the date a project was assigned but would also like to return the age of that project in days. Let’s add to our previous data model:
ALTER TABLE project ADD COLUMN dateassigned DATE;
UPDATE project SET dateassigned = '2013/09/10' WHERE project_id = 1;
UPDATE project SET dateassigned = '2013/09/16' WHERE project_id = 2;
INSERT INTO project (project_id, employee_id, name, dateassigned)
VALUES (3, 3, 'Send a raven to Kings Landing', '2013/09/21');
INSERT INTO project (project_id, employee_id, name, dateassigned)
VALUES (4, 2, 'Scout wildling movement', '2013/09/01');This time we are going to reverse the desired results and query for a list of employees and their respective projects. But let’s add the age of each project in addition to the date assigned. How can we accomplish this?
One approach would be to use a subquery to create a new virtual table, and then use row_to_json to transform that virtual table row into JSON. We will do essentially the same thing, but use a common table expression to redefine the project table with an additional “age” column. Also note that since employee->project is a one-to-many relationship, we are going to use the json_agg aggregate function to return a JSON array of objects instead of a single object as we did previously with row_to_json.
WITH project AS (
SELECT
p.*,
date_part('day', age(now(), dateassigned::timestamp)) as age
FROM project p
)
SELECT
e.employee_id,
e.name,
json_agg(p.*) as projects
FROM employee e
INNER JOIN project p USING (employee_id)
WHERE employee_id = 3
GROUP BY e.employee_id, e.nameThis query returns the following JSON to our Javascript backend. Note it is filtered to a single employee to reduce the size of the result set:
[
{
"employee_id": 3,
"name": "Samwell Tarley",
"projects": [{
"project_id": 2,
"employee_id": 3,
"name": "Research the Wights",
"dateassigned": "2013-09-16",
"age": 6
},
{
"project_id": 3,
"employee_id": 2,
"name": "Send a raven to Kings Landing",
"dateassigned": "2013-09-21",
"age": 1
}]
}
]Recursive Common Table Expressions
Your dataset may include more complicated relationships in the form of tree-structured, hierarchical data as well. For example, tracking the management structure of our sample organization, where each “employee” reports to an immediate superior. This forms a tree of employees with the boss as the root node followed by subordinates. Probably the most common way to model such a tree is using the Adjacency List Model. In this representation, each employee record contains a reference to their immediate superior. If they are the head of the organization, this reference is null.
Let’s modify our existing schema once more to reflect this:
ALTER TABLE employee ADD COLUMN superior_id INT REFERENCES employee(employee_id);
INSERT INTO employee (employee_id, name, superior_id)
VALUES (4, 'Jeor Mormont', null);
UPDATE employee SET superior_id = 4 WHERE employee_id <> 4;
INSERT INTO employee (employee_id, name, superior_id)
VALUES (5, 'Ghost', 1);
INSERT INTO employee (employee_id, name, superior_id)
VALUES (6, 'Iron Emmett', 1);
INSERT INTO employee (employee_id, name, superior_id)
VALUES (7, 'Hareth', 6);We can now use a recursive CTE (common table expression) to return this tree of data in a single query along with the depth of each node. Recursive CTEs allow you to reference the virtual table within its own definition. They take the form of two queries joined by a union, where one query acts as the terminating condition of the recursion and the other joins to it. Technically they are implemented iteratively in the underlying engine, but it can be useful to think recursively when composing the queries.
WITH RECURSIVE employeetree AS (
SELECT e.*, 0 as depth
FROM employee e
WHERE e.employee_id = 1
UNION ALL
SELECT e.*, t.depth + 1 as depth
FROM employee e
INNER JOIN employeetree t
ON t.employee_id = e.superior_id
)
SELECT * FROM employeetreeThe above query will return Jon Snow, employee_id = 1, and all employees subordinate to him either directly or through the tree. In our small dataset, the result looks like this:
employee_id | name | superior_id | depth
-------------+-------------+-------------+-------
1 | Jon Snow | 4 | 0
5 | Ghost | 1 | 1
6 | Iron Emmett | 1 | 1
7 | Hareth | 6 | 2This is fairly straightforward in our example, but it can be a powerful tool on larger trees especially when extended and combined with other techniques.
Combining Everything
We can use recursive CTEs in conjunction with the JSON functions to produce some really useful results. The following query will return the record for “Hareth”, employee_id = 7, and a nested list of his superiors up to and including the root node. Each superior will be a complete employee object including their projects.
WITH RECURSIVE employeetree AS (
WITH employeeprojects AS (
SELECT
p.employee_id,
json_agg(p.*) as projects
FROM (
SELECT
p.*,
date_part('day', age(now(), dateassigned::timestamp)) as age
FROM project p
) AS p
GROUP BY p.employee_id
)
SELECT
e.*,
null::json as superior,
COALESCE(ep.projects, '[]') as projects
FROM employee e
LEFT JOIN employeeprojects ep
USING(employee_id)
WHERE superior_id IS NULL
UNION ALL
SELECT
e.*,
row_to_json(sup.*) as superior,
COALESCE(ep.projects, '[]') as projects
FROM employee e
INNER JOIN employeetree sup
ON sup.employee_id = e.superior_id
LEFT JOIN employeeprojects ep
ON ep.employee_id = e.employee_id
)
SELECT *
FROM employeetree
WHERE employee_id = 7This query returns the following nested objects:
{
"employee_id": 7,
"name": "Hareth",
"superior_id": 6,
"superior": {
"employee_id": 6,
"name": "Iron Emmett",
"superior_id": 1,
"superior": {
"employee_id": 1,
"name": "Jon Snow",
"superior_id": 4,
"superior": {
"employee_id": 4,
"name": "Jeor Mormont",
"superior_id": null,
"superior": null,
"projects": []
},
"projects":[
{
"project_id":1,
"employee_id":1,
"name":"Infiltrate Mance Rayder's Camp",
"dateassigned":"2013-09-10",
"age":12
}
]
},
"projects":[]
},
"projects": []
}This is really just the tip of the iceberg, but as you can see there is considerable power available natively from PostgreSQL. It is great to be able to work with data directly from a database query without having to reformat it, use a mapping layer, or run any additional queries.
Great article I use a lot tree structures and using this would help a lot, I didn't know that you could create columns like **employee id**.
Code sure balloons up in complexity. Are you aware of any contrib modules that would make writing queries such as these more pleasant?
That's a good question. Unfortunately I think much of what makes it longer and harder to read is related to the syntax for recursive CTEs since you are essentially defining the same query twice. You also aren't allowed to use aggregates in them, which makes the other CTE more complicated than it would otherwise have to be as well (compare to the earlier example). That is one downside, but I think the benefits are worth it.
Great post. Thanks for showing the power of PostgreSQL’s JSON support.
One note:
date_part('day', age(now(), dateassigned::timestamp)) as agewill give the wrong answer when the range is larger than a month. For whole days with a date field, you probably want somethingcurrent_date - dateassigned as age. If something like hours was required,date_part('epoch', …) / 3600works well.Thanks! Yeah good call on the age calculation. Those options are definitely better in a real situation.
Saved my life right here with `row_to_json` thank you so much for the awesome post!
Glad to hear it!
Hi.
Any pointers on how to combine in the same query row_to_json and json_agg. The idea is to include the parent (client) and the children (tasks) of the current item (project). For the query bellow i an error.
Thank you
---
ERROR: column "clients.*" must appear in the GROUP BY clause or be used in an aggregate function
LINE 10: row_to_json(clients.*) as client,
^
********** Error **********
ERROR: column "clients.*" must appear in the GROUP BY clause or be used in an aggregate function
SQL state: 42803
Character: 169
---
WITH clients AS (
SELECT id, name FROM clients
), tasks AS (
SELECT id, project_id, name FROM tasks
)
SELECT
projects.*,
--clients.*,
--tasks.*
row_to_json(clients.*) as client,
json_agg(tasks.*) as tasks
FROM projects
INNER JOIN clients ON projects.client_id = clients.id
INNER JOIN tasks ON projects.id = tasks.project_id
GROUP BY projects.id
Found a solution like this, not sure if it's the most optimal considering that it might be needed to fetch multiple parents and multiple types of children at the same time.
How to get recursive json with employee #4 in array at top ? - "select * from
employee where superior_id is null", so any element will have
"inferiors" array instead "superior" employee object - but how to do it ?
Can this be done using Mysql workbench? Please let me know soon.
Thanks,
Sri
Woah, it's been a while since I've been in SQL land. I'm super impressed that database queries can return recursive structures like this nowadays. Thanks for sharing!
--> Here I am getting
could not identify an equality operator for type json
LINE 46: GROUP BY MONTH_OR_YEAR,DETAILS
error.
please can you find the solution for this
SELECT array_to_json(array_agg(result))
FROM
(
SELECT * FROM
(
SELECT
(
SELECT MIN(to_char(eCPService_date_from , 'MON YYYY')) FROM tbl_eCPServiceline
WHERE eCPClaim_key = C1.eCPClaim_key
) AS MONTH_OR_YEAR ,
(
select row_to_json(row) AS DETAILS
from
(
SELECT CLAIM_TYPE,COUNT(CLAIM_COUNT) AS CLAIM_COUNT ,SUM(TOTAL_BILLABLE) AS TOTAL_BILLABLE
FROM (
SELECT
(
SELECT eCPClaim_Custom_Value1
FROM tbl_eCPClaim_Custom
WHERE eCPClaim_Key = C.eCPClaim_Key
AND eCPClaim_Custom_Field_Config_ID = 'dropdown2' LIMIT 1
) AS CLAIM_TYPE ,
COUNT(*) AS CLAIM_COUNT ,
SUM(eCPClaim_payer_billable_amount) AS TOTAL_BILLABLE
FROM tbl_eCPClaim C
WHERE C.ecpclaim_key=C1.ecpclaim_key
GROUP BY C.eCPClaim_Key
LIMIT 1
)AS V
GROUP BY CLAIM_TYPE
)row
)
FROM tbl_eCPClaim C1
WHERE C1.eCPClaim_Key IN (
SELECT RES.eCPClaim_Key FROM tbl_eCPRemit TR, tbl_eCPClaimResponseSummary RES
WHERE TR.eCPRemit_Key = RES.ecpclaimresponsesummary_response_key
AND TR.eCPResponseBatch_Key='0169334867'
AND RES.ecpclaimresponsesummary_response_type=205
AND TR.eCPRemit_Claim_Status_Code <> '4'
)
GROUP BY
C1.eCPClaim_key
)AS VV
GROUP BY MONTH_OR_YEAR,DETAILS
)result