Hierarchical Data in Relational Databases

If you ever had to store hierarchical data (for example, tree of categories) in the relational database (for example MySQL), very soon, you realized it is not easy and efficient. In fact, relational databases are not designed for storing hierarchical data.
In this tutorial you will learn how to store hierarchical data in efficient way. In the first part we will learn about theory and how storing the hierarchical data in the database really works without using 3rd party libraries. In the second part we will demonstrate how to use stofDoctrineExtensionsBundle (which among other things support hierarchical data) with Symfony 4 and Doctrine.

Example

This is the example of data that we will use in this tutorial. Category tree of the bicycle shop (not so different from www.bicycle-discounts.com).

The adjacency list model

The adjacency list model is the simplest and straightforward way (but not very efficient) to represent hierarchical data. Each node has parent id. Here is the above category tree in MySQL table:

CREATE TABLE `category` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`parent_id` int(11) DEFAULT NULL,

`name` varchar(255) DEFAULT NULL,

PRIMARY KEY (`id`),

KEY `ix_parent_id` (`parent_id`),

CONSTRAINT `fk_parent_id` FOREIGN KEY (`parent_id`) REFERENCES `category` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (1,NULL,'Home');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (2,1,'Bikes');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (3,1,'Components');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (4,1,'Wheels & Tyres');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (5,2,'Mountain');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (6,2,'Road & Time Trail');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (7,4,'Rims');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (8,4,'Hubs');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (9,4,'Tyres');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (10,5,'Enduro');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (11,5,'XC');

INSERT INTO `category` (`id`,`parent_id`,`name`) VALUES (12,5,'Fat bike');

select * from category;

+----+-----------+-------------------+

| id | parent_id | name |

+----+-----------+-------------------+

| 1 | NULL | Home |

| 2 | 1 | Bikes |

| 3 | 1 | Components |

| 4 | 1 | Wheels & Tyres |

| 5 | 2 | Mountain |

| 6 | 2 | Road & Time Trail |

| 7 | 4 | Rims |

| 8 | 4 | Hubs |

| 9 | 4 | Tyres |

| 10 | 5 | Enduro |

| 11 | 5 | XC |

| 12 | 5 | Fat bike |

+----+-----------+-------------------+

12 rows in set (0,00 sec)

In order to build the tree, you need to retrieve the table rows and recursively build the tree in your client side code (PHP for example). It is impossible to build the tree using one SQL statement. Also if you want to fetch all the products that are in the category Bikes and all its subcategories – again it is impossible to do it in one SQL query. You would probably need one SQL statement to fetch the items, build the tree in your client side code, store the ids of the Bikes and its subcategories. And then again execute another SQL statement to fetch the products that belong to those categories.

So, for such a simple task we need to execute 2 SQL queries and have extra processing on the client side code. Far better solution is to use the nested set model.

The nested set model

Represent the categories as nested containers:

The term “Nested set model” was introduced by Joe Celko.

Hierarchy is still maintained. For example category Bikes still includes Mountain (and its subcategories) and Road & Time Trail. Also, observe that the category “Home “includes all categories between its left and right container border. The category “Mountain” includes all categories between its container left and right border. Even leafs (categories without children) includes everything between its container left and right border, true, they include only them self, but now we have the general rule. Category when represented as nested set include all subcategories between its left and right container border.

Lets show this on the diagram and enumerate containers borders. As you can see the borders are enumerated in a simple way, from left to right, and numbers are increased by one at every container border.

Every category now has its left and right values (as shown on the picture above). So all what is needed to fetch category and its subcategories is one simple query which will say “Give me all categories whose left and right values are between parent left and right values”. For example to fetch all subcategories of category Bike we will ask for all categories which have left value > 2 and right value < 13.

The nested set model as a tree

The nested set mode can be represented as a tree which you traverse preorder. It is called modified preorder tree traversal – MPTT, modified because each node is visited twice. On the first visit left value is set, on the second visit right value is set:

Nested sets are easier to understand – and we will use them when writing queries in this tutorial.

The category table now looks like this:

CREATE TABLE `nested_set_category` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`parent_id` int(11) DEFAULT NULL,

`name` varchar(255) DEFAULT NULL,

`lft` int(11) DEFAULT NULL,

`rgt` int(11) DEFAULT NULL,

PRIMARY KEY (`id`),

KEY `ix_parent_id` (`parent_id`),

KEY `ix_lft` (`lft`),

KEY `ix_rgt` (`rgt`),

CONSTRAINT `fk_nsc_parent_id` FOREIGN KEY (`parent_id`) REFERENCES `nested_set_category` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (1,NULL,'Home',1,24);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (2,1,'Bikes',2,13);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (3,1,'Components',14,15);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (4,1,'Wheels & Tyres',16,23);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (5,2,'Mountain',3,10);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (6,2,'Road & Time Trail',11,12);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (7,4,'Rims',17,18);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (8,4,'Hubs',19,20);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (9,4,'Tyres',21,22);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (10,5,'Enduro',4,5);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (11,5,'XC',6,7);

INSERT INTO `nested_set_category` (`id`,`parent_id`,`name`,`lft`,`rgt`) VALUES (12,5,'Fat bike',8,9);

Please note that parent_id is not needed by the nested set model, but it is left for convenience. For simplicity we will assume that categories name are unique.

select * from nested_set_category;

+----+-----------+-------------------+------+------+

| id | parent_id | name | lft | rgt |

+----+-----------+-------------------+------+------+

| 1 | NULL | Home | 1 | 24 |

| 2 | 1 | Bikes | 2 | 13 |

| 3 | 1 | Components | 14 | 15 |

| 4 | 1 | Wheels & Tyres | 16 | 23 |

| 5 | 2 | Mountain | 3 | 10 |

| 6 | 2 | Road & Time Trail | 11 | 12 |

| 7 | 4 | Rims | 17 | 18 |

| 8 | 4 | Hubs | 19 | 20 |

| 9 | 4 | Tyres | 21 | 22 |

| 10 | 5 | Enduro | 4 | 5 |

| 11 | 5 | XC | 6 | 7 |

| 12 | 5 | Fat bike | 8 | 9 |

+----+-----------+-------------------+------+------+

12 rows in set (0,00 sec)

Retrieve the data

Retrieve the full tree (or full subtree)

Retrieve the “Bikes” category and all its subcategories.

SELECT

node.*

FROM

nested_set_category as node

INNER JOIN nested_set_category as parent ON

(node.lft >= parent.lft AND node.lft <= parent.rgt)

WHERE

parent.name = "Bikes"

ORDER BY node.lft;

+----+-----------+-------------------+------+------+

| id | parent_id | name | lft | rgt |

+----+-----------+-------------------+------+------+

| 2 | 1 | Bikes | 2 | 13 |

| 5 | 2 | Mountain | 3 | 10 |

| 10 | 5 | Enduro | 4 | 5 |

| 11 | 5 | XC | 6 | 7 |

| 12 | 5 | Fat bike | 8 | 9 |

| 6 | 2 | Road & Time Trail | 11 | 12 |

+----+-----------+-------------------+------+------+

6 rows in set (0,00 sec)

Checkout the nested set model image: query simple selects all categories whose left and right values are between parent’s container left and right values.

Retrieve all parents of the node (and the node itself)

Retrieve all the parents (path to the category) of the “Fat bike” category (and the “Fat bike” category itself):

SELECT

parent.*

FROM

nested_set_category as node

INNER JOIN nested_set_category as parent ON

( parent.lft <= node.lft AND parent.rgt >= node.lft )

WHERE

node.name = "Fat bike"

ORDER BY

node.lft;

+----+-----------+----------+------+------+

| id | parent_id | name | lft | rgt |

+----+-----------+----------+------+------+

| 1 | NULL | Home | 1 | 24 |

| 2 | 1 | Bikes | 2 | 13 |

| 5 | 2 | Mountain | 3 | 10 |

| 12 | 5 | Fat bike | 8 | 9 |

+----+-----------+----------+------+------+

4 rows in set (0,00 sec)

Take a look at the nested set model image – the query is very simple. First we find the node with the name “Fat bike” and then from the image it is clear that its parents will have left value smaller and the right value larger then the node in question.

Retrieve leaf nodes (nodes without children)

SELECT

node.*

FROM

nested_set_category AS node

WHERE

node.rgt = node.lft + 1

ORDER BY

node.lft;

+----+-----------+-------------------+------+------+

| id | parent_id | name | lft | rgt |

+----+-----------+-------------------+------+------+

| 10 | 5 | Enduro | 4 | 5 |

| 11 | 5 | XC | 6 | 7 |

| 12 | 5 | Fat bike | 8 | 9 |

| 6 | 2 | Road & Time Trail | 11 | 12 |

| 3 | 1 | Components | 14 | 15 |

| 7 | 4 | Rims | 17 | 18 |

| 8 | 4 | Hubs | 19 | 20 |

| 9 | 4 | Tyres | 21 | 22 |

+----+-----------+-------------------+------+------+

8 rows in set (0,00 sec)

If you take another look at the nested set model image you will notice that difference between left and right values of leaf nodes is always 1. 🙂

Retrieve the depth of the nodes

This one might be a little bit tricky, so lets go through it step by step. The goal is to calculate the depth for each node, and depth of the node is related to the number of parents. Plan is to use previous query that returns parents of the node and count them. 🙂
So lets start with the query which will return all the parents of the node:

SELECT

parent.*

FROM

nested_set_category AS parent

INNER JOIN nested_set_category AS node ON

( node.lft >= parent.lft AND node.lft <= parent.rgt )

WHERE

node.name = "Fat bike"

ORDER BY

node.lft;

As you remember from previous examples this will return:

+----+-----------+----------+------+------+

| id | parent_id | name | lft | rgt |

+----+-----------+----------+------+------+

| 1 | NULL | Home | 1 | 24 |

| 2 | 1 | Bikes | 2 | 13 |

| 5 | 2 | Mountain | 3 | 10 |

| 12 | 5 | Fat bike | 8 | 9 |

+----+-----------+----------+------+------+

“Fat bike” has 3 parents and depth is 4 (if we count from 1, which usually we don’t 🙂 ).

Lets change so it prints only the node name and the parent name:

SELECT

node.name as node_name,

parent.name

FROM

nested_set_category as node

INNER JOIN nested_set_category as parent ON

( parent.lft <= node.lft AND parent.rgt >= node.lft )

WHERE

node.name = "Fat bike"

ORDER BY

node.lft;

+-----------+----------+

| node_name | name |

+-----------+----------+

| Fat bike | Home |

| Fat bike | Bikes |

| Fat bike | Mountain |

| Fat bike | Fat bike |

+-----------+----------+

4 rows in set (0,00 sec)

And now lets count the number of parent.name . 1 is subtracted because we want depth do be calculated from 0.

SELECT

node.name as node_name,

count(parent.name) - 1

FROM

nested_set_category as node

INNER JOIN nested_set_category as parent ON

( parent.lft <= node.lft AND parent.rgt >= node.lft )

WHERE

node.name = "Fat bike"

ORDER BY

node.lft;

+-----------+------------------------+

| node_name | count(parent.name) - 1 |

+-----------+------------------------+

| Fat bike | 3 |

+-----------+------------------------+

1 row in set (0,00 sec)

To get the depths for all nodes, we need to group by node.id or node.name and remove the WHERE condition.

Final SQL query which returns depth of the nodes:

SELECT

node.*,

count(parent.name) - 1 as depth

FROM

nested_set_category as node

INNER JOIN nested_set_category as parent ON

( parent.lft <= node.lft AND parent.rgt >= node.lft )

GROUP BY

node.id

order by

depth, node.lft;

+----+-----------+-------------------+------+------+-------+

| id | parent_id | name | lft | rgt | depth |

+----+-----------+-------------------+------+------+-------+

| 1 | NULL | Home | 1 | 24 | 0 |

| 2 | 1 | Bikes | 2 | 13 | 1 |

| 3 | 1 | Components | 14 | 15 | 1 |

| 4 | 1 | Wheels & Tyres | 16 | 23 | 1 |

| 5 | 2 | Mountain | 3 | 10 | 2 |

| 6 | 2 | Road & Time Trail | 11 | 12 | 2 |

| 7 | 4 | Rims | 17 | 18 | 2 |

| 8 | 4 | Hubs | 19 | 20 | 2 |

| 9 | 4 | Tyres | 21 | 22 | 2 |

| 10 | 5 | Enduro | 4 | 5 | 3 |

| 11 | 5 | XC | 6 | 7 | 3 |

| 12 | 5 | Fat bike | 8 | 9 | 3 |

+----+-----------+-------------------+------+------+-------+

Indented tree

Using MySQL string functions it is easy to retrieve indented tree (this query is based on query that returns depth of the nodes):

SELECT

CONCAT(REPEAT(' ', count(parent.name) - 1), node.name)

FROM

nested_set_category as node

INNER JOIN nested_set_category as parent ON

( parent.lft <= node.lft AND parent.rgt >= node.lft )

GROUP BY

node.id

order by

node.lft;

+------------------------------------------------------------+

| CONCAT(REPEAT(' ', count(parent.name) - 1), node.name) |

+------------------------------------------------------------+

| Home |

| Bikes |

| Mountain |

| Enduro |

| XC |

| Fat bike |

| Road & Time Trail |

| Components |

| Wheels & Tyres |

| Rims |

| Hubs |

| Tyres |

+------------------------------------------------------------+

12 rows in set (0,00 sec)

Joins between the nested set model hierarchical data and its items

Or, in other words, how to work with categories and products that belong to that categories. Sample products table and data:

CREATE TABLE `product` (

`id` int(11) NOT NULL AUTO_INCREMENT,

`name` varchar(255) DEFAULT NULL,

`category_id` int(11) DEFAULT NULL,

PRIMARY KEY (`id`),

KEY `ix_category_id` (`category_id`),

CONSTRAINT `fk_category_id` FOREIGN KEY (`category_id`) REFERENCES `nested_set_category` (`id`) ON DELETE NO ACTION ON UPDATE NO ACTION

) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4;

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (1,'Enduro Comp 29/6 Fattie Enduro Mountain Bike ',10);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (2,'Kona Process 167 Mountain Bike ',10);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (3,'Lapierre XR 729 Suspension Bike',11);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (4,'Kona Hei Hei Race Mountain Bike',11);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (5,'Kona Wo Fat Bike',12);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (6,'Charge Cooker Midi 5 Mountain Bike',12);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (7,'Eddy Merckx Milano 72 Womens Road Bike',6);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (8,'GT Grade Carbon Adventure Road Bike ',6);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (9,'Lapierre Aircode SL 500 MC Road Bike',6);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (10,'e.thirteen Cassette Range Expander Cog ',3);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (11,'Shimano XTR Trail M980 10 Speed Double Chainset',3);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (12,'SRAM PowerLink PowerLock Chain Connector ',3);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (13,'Easton EC90 SLX3 Pro Ergo Road Bar',3);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (14,'WTB ST i19 MTB Rim',7);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (15,'DT Swiss RR 521 Disc Brake 20mm Road Rim',7);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (16,'Hope Tech XC MTB Rim',7);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (17,'Hope RS4 Centre Lock Rear Hub',8);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (18,'Shimano Saint Hub Rear M820 Black 32H - 135 - 10mm',8);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (19,'Schwalbe Racing Ralph Performance CX Tyre',9);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (20,'Maxxis Minion DHR II Folding MTB Tyre ',9);

INSERT INTO `product` (`id`,`name`,`category_id`) VALUES (21,'WTB Breakout 27.5\" x 2.5 TCS Tough High Grip Tyre',9);

Number of products in each category

Goal: retrieve the number of products from each category, parent categories should display the sum of products from all its subcategories.
This query might not be so straightforward, so lets write it step by step. The plan is to LEFT JOIN table product and display a product each time for its own category and its all parents categories and then group and count.

Retrieve all the combinations between subcategory and all its parent categories.

This query will return all parents for each node.

SELECT

node.name as node_name,

parent.name as parent_name

FROM

nested_set_category as node

INNER JOIN nested_set_category as parent ON

( parent.lft <= node.lft AND parent.rgt >= node.lft );

+-------------------+-------------------+

| node_name | parent_name |

+-------------------+-------------------+

| Home | Home |

| Bikes | Home |

| Bikes | Bikes |

| Components | Home |

| Components | Components |

| Wheels & Tyres | Home |

| Wheels & Tyres | Wheels & Tyres |

| Mountain | Home |

| Mountain | Bikes |

| Mountain | Mountain |

| Road & Time Trail | Home |

| Road & Time Trail | Bikes |

| Road & Time Trail | Road & Time Trail |

| Rims | Home |

| Rims | Wheels & Tyres |

| Rims | Rims |

| Hubs | Home |

| Hubs | Wheels & Tyres |

| Hubs | Hubs |

| Tyres | Tyres |

| Tyres | Wheels & Tyres |

| Tyres | Home |

| Enduro | Home |

| Enduro | Bikes |

| Enduro | Mountain |

| Enduro | Enduro |

| XC | Home |

| XC | Bikes |

| XC | Mountain |

| XC | XC |

| Fat bike | Home |

| Fat bike | Bikes |

| Fat bike | Mountain |

| Fat bike | Fat bike |

+-------------------+-------------------+

34 rows in set (0,01 sec)

For example, category “Fat bike” is displayed 4 times – one for itself and another 3 times for each of its parent category (“Mountain”, “Bikes”, “Home”).

Now LEFT JOIN the product table:

SELECT

node.name as node_name,

parent.name as parent_name,

product.name as product_name

FROM

nested_set_category as node

INNER JOIN nested_set_category as parent ON

( parent.lft <= node.lft AND parent.rgt >= node.lft )

LEFT JOIN product ON (product.category_id = node.id)

;

+-------------------+-------------------+----------------------------------------------------+

| node_name | parent_name | product_name |

+-------------------+-------------------+----------------------------------------------------+

| Home | Home | NULL |

| Bikes | Home | NULL |

| Bikes | Bikes | NULL |

| Components | Home | e.thirteen Cassette Range Expander Cog |

| Components | Home | Shimano XTR Trail M980 10 Speed Double Chainset |

| Components | Home | SRAM PowerLink PowerLock Chain Connector |

| Components | Home | Easton EC90 SLX3 Pro Ergo Road Bar |

| Components | Components | e.thirteen Cassette Range Expander Cog |

| Components | Components | Shimano XTR Trail M980 10 Speed Double Chainset |

| Components | Components | SRAM PowerLink PowerLock Chain Connector |

| Components | Components | Easton EC90 SLX3 Pro Ergo Road Bar |

| Wheels & Tyres | Home | NULL |

| Wheels & Tyres | Wheels & Tyres | NULL |

| Mountain | Home | NULL |

| Mountain | Bikes | NULL |

| Mountain | Mountain | NULL |

| Road & Time Trail | Home | Eddy Merckx Milano 72 Womens Road Bike |

| Road & Time Trail | Home | GT Grade Carbon Adventure Road Bike |

| Road & Time Trail | Home | Lapierre Aircode SL 500 MC Road Bike |

| Road & Time Trail | Bikes | Eddy Merckx Milano 72 Womens Road Bike |

| Road & Time Trail | Bikes | GT Grade Carbon Adventure Road Bike |

| Road & Time Trail | Bikes | Lapierre Aircode SL 500 MC Road Bike |

| Road & Time Trail | Road & Time Trail | Eddy Merckx Milano 72 Womens Road Bike |

| Road & Time Trail | Road & Time Trail | GT Grade Carbon Adventure Road Bike |

| Road & Time Trail | Road & Time Trail | Lapierre Aircode SL 500 MC Road Bike |

| Rims | Home | WTB ST i19 MTB Rim |

| Rims | Home | DT Swiss RR 521 Disc Brake 20mm Road Rim |

| Rims | Home | Hope Tech XC MTB Rim |

| Rims | Wheels & Tyres | WTB ST i19 MTB Rim |

| Rims | Wheels & Tyres | DT Swiss RR 521 Disc Brake 20mm Road Rim |

| Rims | Wheels & Tyres | Hope Tech XC MTB Rim |

| Rims | Rims | WTB ST i19 MTB Rim |

| Rims | Rims | DT Swiss RR 521 Disc Brake 20mm Road Rim |

| Rims | Rims | Hope Tech XC MTB Rim |

| Hubs | Home | Hope RS4 Centre Lock Rear Hub |

| Hubs | Home | Shimano Saint Hub Rear M820 Black 32H - 135 - 10mm |

| Hubs | Wheels & Tyres | Hope RS4 Centre Lock Rear Hub |

| Hubs | Wheels & Tyres | Shimano Saint Hub Rear M820 Black 32H - 135 - 10mm |

| Hubs | Hubs | Hope RS4 Centre Lock Rear Hub |

| Hubs | Hubs | Shimano Saint Hub Rear M820 Black 32H - 135 - 10mm |

| Tyres | Tyres | Schwalbe Racing Ralph Performance CX Tyre |

| Tyres | Tyres | Maxxis Minion DHR II Folding MTB Tyre |

| Tyres | Tyres | WTB Breakout 27.5" x 2.5 TCS Tough High Grip Tyre |

| Tyres | Wheels & Tyres | Schwalbe Racing Ralph Performance CX Tyre |

| Tyres | Wheels & Tyres | Maxxis Minion DHR II Folding MTB Tyre |

| Tyres | Wheels & Tyres | WTB Breakout 27.5" x 2.5 TCS Tough High Grip Tyre |

| Tyres | Home | Schwalbe Racing Ralph Performance CX Tyre |

| Tyres | Home | Maxxis Minion DHR II Folding MTB Tyre |

| Tyres | Home | WTB Breakout 27.5" x 2.5 TCS Tough High Grip Tyre |

| Enduro | Home | Enduro Comp 29/6 Fattie Enduro Mountain Bike |

| Enduro | Home | Kona Process 167 Mountain Bike |

| Enduro | Bikes | Enduro Comp 29/6 Fattie Enduro Mountain Bike |

| Enduro | Bikes | Kona Process 167 Mountain Bike |

| Enduro | Mountain | Enduro Comp 29/6 Fattie Enduro Mountain Bike |

| Enduro | Mountain | Kona Process 167 Mountain Bike |

| Enduro | Enduro | Enduro Comp 29/6 Fattie Enduro Mountain Bike |

| Enduro | Enduro | Kona Process 167 Mountain Bike |

| XC | Home | Lapierre XR 729 Suspension Bike |

| XC | Home | Kona Hei Hei Race Mountain Bike |

| XC | Bikes | Lapierre XR 729 Suspension Bike |

| XC | Bikes | Kona Hei Hei Race Mountain Bike |

| XC | Mountain | Lapierre XR 729 Suspension Bike |

| XC | Mountain | Kona Hei Hei Race Mountain Bike |

| XC | XC | Lapierre XR 729 Suspension Bike |

| XC | XC | Kona Hei Hei Race Mountain Bike |

| Fat bike | Home | Kona Wo Fat Bike |

| Fat bike | Home | Charge Cooker Midi 5 Mountain Bike |

| Fat bike | Bikes | Kona Wo Fat Bike |

| Fat bike | Bikes | Charge Cooker Midi 5 Mountain Bike |

| Fat bike | Mountain | Kona Wo Fat Bike |

| Fat bike | Mountain | Charge Cooker Midi 5 Mountain Bike |

| Fat bike | Fat bike | Kona Wo Fat Bike |

| Fat bike | Fat bike | Charge Cooker Midi 5 Mountain Bike |

+-------------------+-------------------+----------------------------------------------------+

Each product is retrieved several time – its own category, and for every parent category. For example “Kona Wo Fat Bike” from “Fat bike” is also displayed in parents categories (“Mountain”, “Bikes” and “Home”). This is exactly what we want – because each product should be counted in its own category and in each of its parent categories.

Final query

Changes: Remove node from SELECT, GROUP BY parent.id and COUNT products and sort categories:

SELECT

parent.name as parent_name,

count(product.id)

FROM

nested_set_category as node

INNER JOIN nested_set_category as parent ON

( parent.lft <= node.lft AND parent.rgt >= node.lft )

LEFT JOIN product ON (product.category_id = node.id)

GROUP BY

parent.id

ORDER BY

parent.lft;

+-------------------+---------------------+

| parent_name | count(product.name) |

+-------------------+---------------------+

| Home | 21 |

| Bikes | 9 |

| Mountain | 6 |

| Enduro | 2 |

| XC | 2 |

| Fat bike | 2 |

| Road & Time Trail | 3 |

| Components | 4 |

| Wheels & Tyres | 8 |

| Rims | 3 |

| Hubs | 2 |

| Tyres | 3 |

+-------------------+---------------------+

12 rows in set (0,00 sec)

Quick check:

“Mountain”, “Enduro” and “XC” each has 2 products – so its parent “Mountain” has 6.
“Road & Time Trail” has 3 products. So “Bikes” has 9 products.
“Components” has 4 products.
“Wheels & Tyres” has 2 products from “Hubs” and 3 from “Rims” and “Tyres”. Total of 8 products.
So “Home” has 9+4+8 = 21 products.

Insert a new node

Lets add a new node “Clothing & Footwear” between “Bikes” and “Components”:

Left/Right values that will not be changed are displayed in black squares
Old values are displayed in red circles
New values are displayed in green squares

Since the “Components” category has right value 15, a new category “Clothing & Footwear” will have values 16 for left and 17 for right.
All categories on the right side of “Components” will have to be adjusted – their left and right values will have to be increased by two.
“Components” and all other categories on the left will remain unchanged.

SQL statements for node insert:

LOCK TABLE nested_set_category WRITE;

SELECT rgt, parent_id INTO @insertAfterRight, @parentId FROM nested_set_category WHERE name = "Components";

UPDATE nested_set_category SET rgt = rgt + 2 where rgt > @insertAfterRight;

UPDATE nested_set_category SET lft = lft + 2 where lft > @insertAfterRight;

INSERT INTO nested_set_category(parent_id, name, lft, rgt)

VALUES(@parentId, "Clothing & Footware", @insertAfterRight+1, @insertAfterRight+2);

UNLOCK TABLES;

Compare data in the table with latest nested set model image:

SELECT * FROM nested_set_category ORDER BY lft;

+----+-----------+---------------------+------+------+

| id | parent_id | name | lft | rgt |

+----+-----------+---------------------+------+------+

| 1 | NULL | Home | 1 | 26 |

| 2 | 1 | Bikes | 2 | 13 |

| 5 | 2 | Mountain | 3 | 10 |

| 10 | 5 | Enduro | 4 | 5 |

| 11 | 5 | XC | 6 | 7 |

| 12 | 5 | Fat bike | 8 | 9 |

| 6 | 2 | Road & Time Trail | 11 | 12 |

| 3 | 1 | Components | 14 | 15 |

| 14 | 1 | Clothing & Footware | 16 | 17 |

| 4 | 1 | Wheels & Tyres | 18 | 25 |

| 7 | 4 | Rims | 19 | 20 |

| 8 | 4 | Hubs | 21 | 22 |

| 9 | 4 | Tyres | 23 | 24 |

+----+-----------+---------------------+------+------+

Table lock is necessary to avoid race condition if more categories are being edited at the same time. Also using transaction would be a good idea in case one of the statements fails – so we could rollback to the previous state.

Delete a node

Deleting a node is a little bit more complicated, we need to delete the all children of the that node and update the remaining nodes. This are our categories after “Clothing & Footwear” was added:

Lets delete category “Mountain”. Updated nested set model image (remove “Mountain” category), so it would be clear what needs to be done:

Left/Right values that will not be changed are displayed in black squares
Old values are displayed in red circles
New values are displayed in green squares

As “Mountain” category and its children are deleted – nodes values on the right side needs to be recalculated. Their values will be changed by width of “Mountain” container. Width is number of borders of “Mountain” container and its children containers.
There are 8 borders (count them by hand if you want), or 10 (right value of Mountain node) – 3 (left value of Mountain node) + 1 = 8.

With this in mind, it lets write SQL statements to delete the node.

LOCK TABLES product WRITE, nested_set_category WRITE;

SELECT

lft, rgt, rgt-lft+1 INTO @delLft, @delRgt, @delWidth

FROM

nested_set_category

WHERE

name="Mountain";

DELETE

product

FROM

product

INNER JOIN nested_set_category ON ( product.category_id = nested_set_category.id )

WHERE

nested_set_category.lft>=@delLft AND

nested_set_category.rgt<=@delRgt;

DELETE FROM

nested_set_category

WHERE

lft>=@delLft AND rgt<=@delRgt

ORDER BY

lft DESC;

UPDATE

nested_set_category

SET

rgt = rgt - @delWidth

WHERE

rgt > @delRgt;

UPDATE

nested_set_category

SET

lft = lft - @delWidth

WHERE

lft > @delRgt;

UNLOCK TABLES;

Since I am using FOREIGN key with “NO ACTION” on update and delete in product table – I first need to delete products which belong to “Mountain” category and its subcategories. Also, there is similar situation with nested_set_category.product_id. I have to sort nodes to be deleted by “lft DESC” in order to delete childrens first and then parents so that foreign key error is avoided.

Continue to 2nd part: Hierarchical Data in Relational Databases with Symfony 4 and Doctrine.

4 Comments

aelores 5 years ago /Reply

This is awesome. You know what you’re talkin’ about.
Thank’s
- drib 5 years ago /Reply
  
  Thank you for the feedback! <3
Simon Ward 4 years ago /Reply

I was looking for tips on representing and querying hierarchical data, and especially found your description of the nested sets model useful.

For querying hierarchical data in the adjacency list model I suggest looking up recursive common table expressions (CTEs). These are standard (ISO/IEC) SQL, although, depending on the DBMS, the implementation may not be complete or was only added relatively recently — they appeared in MySQL in version 8.0, the documentation is at https://dev.mysql.com/doc/refman/8.0/en/with.html

Modern SQL indicates support for various DBMS offerings: https://modern-sql.com/caniuse/with_recursive_(top_level)

I also use Oracle Database, which has had support for hierarchical or linked queries in the form of the non-standard CONNECT BY for quite some time.
- drib 4 years ago /Reply
  
  Thank you for the comment and sharing the links! 🙂

drib

Hierarchical Data in Relational Databases Part 1

Example

The adjacency list model

The nested set model

The nested set model as a tree

Retrieve the data

Retrieve the full tree (or full subtree)

Retrieve all parents of the node (and the node itself)

Retrieve leaf nodes (nodes without children)

Retrieve the depth of the nodes

Final SQL query which returns depth of the nodes:

Indented tree

Joins between the nested set model hierarchical data and its items

Number of products in each category

Retrieve all the combinations between subcategory and all its parent categories.

Final query

Insert a new node

Delete a node

Links

4 Comments

Submit a comment Cancel reply