Notes
When the table has foreign keys referencing two tables, it’s called an intersection table
The naive solution commonly shown in books and articles is to add a column parent_id. This column references another comment in the same table, and you can create a foreign key constraint on this relationship. This design is called Adjacency List.
I never knew this was called an “adjacency list”.
One weakness of the adjacency list is that it’s expensive to retrieve ancestors of a given node in the tree. In Path Enumeration, that is solved by storing in the string of ancestors as an attribute of each node.
You can see a form of Path Enumeration in a directory hierarchies. A UNIX path like /usr/local/lib
is a Path Enumeration of the filesystem, where usr
is the parent of the local
, which in turn is the parent of the lib.
The Nested Sets solution stores information with each node that pertains to the set of its descendants, rather than the node’s immediate parent. This information can be represented by encoding each node in the tree with two numbers, which you can call nsleft
and nsright
.
The Closure Table solution is a simple and elegant way of storing hierarchies. It involves storing all paths through the tree, not just those with a direct parent-child relationship.
In addition to a plain Comments
table, create another table TreePaths
, with two columns, each of which is a foreign key to the Comments
table.
Which Design Should You Use?
Each of the designs has its own strengths and weaknesses. Choose the design depending on which operations you need to be most efficient.
In the table shown, some operations are marked as easy or hard each respective tree design
Adjacency List |
1 |
Easy |
Hard |
Easy |
Easy |
Yes |
Recursive Query |
1 |
Easy |
Easy |
Easy |
Easy |
Yes |
Path Enumeration |
1 |
Easy |
Easy |
Easy |
Easy |
No |
Nested Sets |
1 |
Hard |
Easy |
Hard |
Hard |
No |
Closure Table |
2 |
Easy |
Easy |
Easy |
Easy |
Yes |
Adjcaency List is the most conventional design, and many software developers recognize it. It has the advantage over the other designs that it’s normalized. In other words, it has no redundancies, and it’s not possible to create conflicting data. Recursive queries using WITH
or CONNECT BY PRIOR
make it more efficient to use the Adjacency List design, provided you use a version of SQL database that supports the syntax.
Path Enumeration is good for breadcrumbs in user interfaces, but it’s fragile because it fails to enforce referential integrity and stores information redundantly.
Nested Sets is a clever solution – maybe too clever. It also fails to support referential integrity. It’s best used when you need to query a tree more than you need to modify the tree.
Closure Table is the most versatile of the alternative designs, and the only design in this chapter that allows node to belong to multiple trees. It requires an additional table to store the relationships. This design also uses a lot of rows when encoding deep hierarchies, increasing space consumption as a trade-off for reducing computing. Like many denormalized solutions, it gives good performance for certain query cases.
This page in the book is probably my favorite because at work I was just dealing with this problem at work on the best way to store hierarchal data and it was nice to learn about these different data modeling patterns to address this case.
- Compound key consists of multiple columns vs a surrogate key is artificial value with no meaning to be used as PK. I always get this terms mixed up
The way you can declare the ON UPDATE
or ON DELETE
clauses in the foreign key constraint allow you to control the result of a cascading operation.
TIL that you can cascade updates, I thought this was only for deletes
This design is called Entity-Attribute-Value, or EAV for short.
I wish my eyes have never seen this data modeling pattern.
Review
I have been writing SQL for 4.5 years and would say my SQL skills are advanced and still, I came away with some helpful tips after reading this book. Granted most of the SQL I write is for OLAP workloads whereas this book was more OLTP workloads. Nonetheless, I still found this book informative and it was nice to learn the proper terminology for things that I come across on a daily basis. It was a very easy read and would recommend to others looking to strengthening their SQL skills.