SQL self-join to label nodes on binary search tree

Question

Would like feedback on my SQL (1) efficiency and (2) readability, and if the conditions I featured in my CASE statement could be transferred to a join, etc.

Here's the problem I'm solving:

Say you have a table bst with a column of nodes and column or corresponding parent nodes:
node   Parent
1       2
2       5
3       5
4       3
5       NULL
We want to write SQL such that we label each node as a “leaf”, “inner” or “Root” node, such that for the nodes above we get:
Node    Label
1       Leaf
2       Inner
3       Inner
4       Leaf
5       Root

Here's my solution:

WITH join_table AS 
(
SELECT 
    a.node a_node,
    a.parent a_parent,
    b.node b_node, 
    b.parent b_parent
 FROM
    bst a 
 LEFT JOIN 
    bst b ON a.parent = b.node 
 )

 SELECT 
    a_node as Node, 
    CASE 
        WHEN b_node IS NOT NULL and b_parent IS NOT NULL THEN 'Leaf'
        WHEN b_node IS NOT NULL and b_parent IS NULL THEN 'Inner'
        WHEN b_node IS NULL and b_parent IS NULL THEN 'Root'
        ELSE 'Error' 
    END AS Label 
 FROM 
    join_table

What dialect of SQL are you targeting? (Apart from that, this is a really well-asked SQL question, so well done!) — Toby Speight
– Toby Speight, Commented Jan 30, 2019 at 16:58
@TobySpeight Let's just say Hive, but Postgres or MySQL would work too -- more focused on the underlying logic — Mysterious Otter
– Mysterious Otter, Commented Jan 30, 2019 at 17:14
It is possible to rewrite your query without CASE, however, it will not help the performance since you are reading the whole tables anyway. The sequential scan is inevitable in this case. — Radim Bača
– Radim Bača, Commented Jan 30, 2019 at 19:17
@zthomas.nc it would be a solution using UNION ALL which would lead to three sequential scans instead of one. — Radim Bača
– Radim Bača, Commented Jan 31, 2019 at 18:19

Sᴀᴍ Onᴇᴌᴀ · Accepted Answer · 2023-08-07 06:30:07Z

This solution will work only on the dataset that you've mentioned. If you apply this code to another dataset it will throw up erroneous results. Here is why:

Condition b_node IS NULL and b_Parent is NULL will always give correct result i.e. 'Root'
The conditions for 'Leaf' and 'Inner' will change depending on the dataset. let's assume 5 is not the parent root, instead it is an inner node and there is a parent node 6 above it and 6 also has a child node 7

So the table becomes

 1. a_node  a_parent b_node b_parent




 1. 1        2       2       5
 2. 2        5       5       6
 3. 3        5       5       6
 4. 4        3       3       5
 5. 5        6       6       NULL
 6. 6        NULL    NULL    NULL
 7. 7        6       6       NULL

here 2 and 3 are not nulls and will be marked as 'leaf' erroneously.

A better solution is this case is to use CASE to identify those nodes in a_node which can never be a parent in a_parent since they are leaf nodes.

Stack Exchange Network

SQL self-join to label nodes on binary search tree

1 Answer 1

You must log in to answer this question.

Hot Network Questions

SQL self-join to label nodes on binary search tree

1 Answer 1

You must log in to answer this question.

Related

Hot Network Questions