Scapegoat Trees in C++

22 Mar 2025 | 9 min read

Scapegoat trees are self-balancing BSTs that work to maintain efficiency in their operations, such as insertion, deletion, and Search, by rebuilding subtrees whenever they become unbalanced. Unlike the AVL or Red-Black trees, which use rotations right after every insertion or deletion to maintain the balance, Scapegoat Trees employ a less aggressive balancing strategy in that they allow the tree to stay unbalanced for quite some time.

This means that when the tree becomes too unbalanced, rather than immediately rebalancing, the tree allows the imbalance to grow. Once the imbalance exceeds a certain threshold deep within the tree, a subtree is rebuilt. By using the α-balance criterion (a threshold for detecting imbalance), the tree ensures that its height remains logarithmic. This helps maintain efficient operations like search, insertion, and deletion, even when dealing with dynamic and unpredictable data sets.

Key Characteristics:

Several key characteristics of Scapegoat Trees in C++ are as follows:

Binary Search Tree (BST): A Scapegoat Tree follows standard BST properties. For any node, the left subtree contains values smaller than the node's value, and the right subtree contains larger values. This structure allows efficient searching, whose time complexity is proportional to the height of the tree.
Balance Criterion: Scapegoat trees maintain balance according to a criterion controlled by a parameter α. The criterion ensures that in each subtree, neither side becomes significantly larger than the other. If at any time this condition is ever violated, such as, there is an imbalance in the subtree, corrective action is done on the subtree concerned. It does not rely on rotations as in AVL or Red-Black trees; instead, the tree builds entire subtrees afresh whenever there is an imbalance problem. It keeps the height of the tree logarithmic in the number of nodes.
α-Balance: A node is said to be unbalanced in a scapegoat tree if one of its subtrees has its size more than a fraction of the total size of the subtree rooted at that node. If the imbalance is detected, the tree finds a "scapegoat" node (the deepest node that is unbalanced) and rebuilds the subtree rooted at that node.

Key Operations:

Several key operations of Scapegoat Trees in C++ are as follows:

1. Insertion:

A node can be initially inserted in a Scapegoat Tree, just like in a common BST. The node will be put in its correct place by comparing it with existing nodes. After insertion, the tree checks whether there are nodes that have just become unbalanced by comparing the depth of the newly inserted node to what is supposed to be a depth that is proportional to log(n), where (n) is the number of nodes. Assuming that the actual depth is greater than this threshold, the tree finds the deepest unbalanced node (the scapegoat) and rebuilds the subtree rooted at that node to restore balance.

2. Deletion:

Deletion in a Scapegoat Tree is similar to the deletion operation in an ordinary BST. The node to be deleted is removed, and the tree is restructured to maintain the binary search tree (BST) properties. However, unlike insertions, deletions do not immediately trigger a subtree rebuild. Instead, the tree rebuilds only when the number of nodes in the tree, after deletions, falls below a threshold level determined as a fraction of the maximum size reached by the tree. This keeps the height within logarithmic bounds even after multiple deletions.

3. Find Scapegoat:

A scapegoat is the node where an imbalance is detected. After insertion, if its depth exceeds the logarithmic bound, the tree backtracks from the newly inserted node toward the root, and checks the Size of the subtrees at each node. The first node that violates the α-balance condition is called the scapegoat. This scapegoat node becomes the root of the subtree that will be rebuilt.

4. Rebuild:

Once a scapegoat node is found, the subtree at that node is rebuilt as an optimally balanced subtree. Generally, this rebuilding is done by collecting all the nodes of the subtree, sorting, and then building a new balanced tree. This rebuild ensures the whole height of the tree remains at most proportional to log(n); hence, no significant performance degradation may happen because of some unbalanced nodes.

Example:

Let us take an example to illustrate the Scapegoat Trees in C++.

Output:

 
Tree after insertions: 30 40 50 60 70 
Searching for 30: Found
Searching for 100: Not Found
Tree after deleting 30: 40 50 60 70

Explanation:

This Scapegoat Tree implementation in C++ is a self-balancing Binary Search Tree (BST) that maintains balance by occasionally rebuilding unbalanced subtrees. The code consists of key components like node structure, insertion, Search, deletion, and subtree rebuilding.

Node Structure:

The Node struct represents each tree node, which holds a key (key), pointers to left and right children (left and right), and an integer size representing the number of nodes in the subtree rooted at that node. The size field is crucial because it helps check the balance of the subtree. Using this α-balance criterion to detect balance and rebuild unbalanced subtrees to ensures that the height of their tree remains logarithmic. Hence, Search, insertions, and deletions are generally efficient, even when working with dynamic and nondeterministic data sets.

Tree Structure:

The main class ScapegoatTree contains the root node (root), the maximum Size (maxSize) the tree has reached, and the balance factor (alpha), typically set to 2/3. The balance factor controls when a subtree is considered unbalanced. The root starts as nullptr in an empty tree, and the maxSize is initialized to zero.

Insertion Operation:

The insert() function first performs a typical BST insertion. After inserting the new node, the tree checks whether any node on the path from the newly inserted node to the root has become unbalanced using the isBalanced() function. A node is unbalanced if its left or right subtree is larger than a fraction (controlled by alpha) of its total Size.

If an imbalance is found, the deepest unbalanced node (the scapegoat) is identified. The insertion operation guarantees an amortized time complexity of O(logn) because rebuilding only happens infrequently when an imbalance occurs.

Search Operation:

The search() function is straightforward. It traverses the tree like a regular BST to find the node containing the desired key. If the key is found, it returns the node; otherwise, it returns nullptr.

Deletion Operation:

The remove() function follows standard BST deletion rules, where a node is removed by either replacing it with its in-order predecessor or successor (if it has two children) or by deleting the node and reconnecting its child. Unlike insertion, deletion doesn't immediately trigger a rebuild. Instead, the size of the tree is checked, and if it becomes too small relative to the maximum Size (less than alpha * maxSize), the entire tree is rebuilt to ensure balance.

Subtree Rebuilding:

The subtree rebuilding process plays a crucial role in maintaining the balance of the Scapegoat Tree. It involves collecting all the nodes in the subtree (using the flatten() function) and reconstructing them into a perfectly balanced tree. It ensures that unbalanced subtrees are corrected with minimal disruption to the rest of the tree.

Complexity Analysis:

Time Complexity:

Insertion:

Insertion in a Scapegoat Tree begins similarly to insertion in a Binary Search Tree (BST), which takes O(logn) in the average case if the tree is balanced. However, after insertion, the tree may become unbalanced, triggering a scapegoat detection and a subtree rebuild.
Finding the scapegoat involves traversing the path from the inserted node to the root, which takes O(logn).
Amortized complexity: The tree rebuilds are infrequent. Although rebuilding a subtree takes O(k), it happens only after several insertions, and the amortized cost of insertion remains O(logn). This is because the cost of the occasional rebuild is spread across many operations.

Deletion:

Deletion follows the standard BST deletion procedure, which is O(logn) if the tree is balanced.
Unlike insertion, deletion doesn't trigger an immediate rebuild. Instead, a rebuild is triggered only when the tree's size falls below a threshold relative to the maximum size reached. If triggered, rebuilding the entire tree takes O(n), but again, this is infrequent.
Amortized complexity: Like insertion, deletion has an amortized time complexity of O(logn).

Search:

Searching in a Scapegoat Tree is identical to a standard BST search, which has a time complexity of O(logn) if the tree remains balanced.

Space Complexity:

The space complexity is O(n) because each node in the tree takes constant space, and there are n nodes in the tree.
Additionally, during rebuilding, a temporary array to hold the nodes of a subtree is used, which may take O(k) space, but k≤n. Thus, the overall space complexity is O(n).

Next TopicContinuous Tree in C++

← prev next →

Scapegoat Trees in C++

Key Characteristics: