2018-11-13 created by AD1024

Data Structure

Template

Java

- Tree
- Heap
- BST
- Java
- AVL

**Treap** is an implementation of AVL. By looking at its name, it is a combination of *Tree* and *Heap*. As we know, the property of a BST(a.k.a Binary Search Tree) is that for a node which is not a leaf in the tree, the value of its left child is less than its own value and the value of its right child is greater than its own value. Then we have a balancing problem: if we create a BST by using the sequence $[1, 2, 3, 4, 5, \cdots]$ (non-decreasing sequence), the structure of the tree will then becomes a linear structure(like a linked list as shown in the graph), because we construct the tree by getting the elements in the sequence one by one.

It maintains the property of a BST but because of its linear structure, the time efficiency becomes pretty low when you want to insert a new node with the largest value, and when you want to perform some queries on the tree.

So we need to come up with some ways to avoid the tree retrogrades to a linear structure so that the tree will maintain a height of $\log N$ where $N$ is the number of nodes and we can keep the time complexity of operations on the tree low.

Then we got a data structure called **Treap**. Not just storing the value in the node, Treap will randomly generate a weight when creating a new node. In order to make the tree “balanced”(not skewed to one direction), in addition to maintain the value so that it follows the property of a BST, it also maintains randomly generated weights of the nodes and makes the weights follow the property of a **Heap**(a min root heap).

In the following sections, I will cover through the ideas of how Treap maintains properties of BST and Heap concurrently and several operations we can do by using Treap. The code will be Java, because I assume most of the reader of the article are familiar with this language(possibly the mostly used language in CS intro class?)

In order to fully utilize the property of BST, we maintain these fields:

- Values of nodes
- Weight (key) of nodes
- Number of same values in a node
- Size of the tree
- Number of nodes in a tree
- Two children of a node

```
public class Node {
public int v, rnd, size, cnt, node_cnt;
public Node[] ch;
public Node(int data, int w) {
v = data; // Actual data
rnd = w; // Weight (Key)
size = 1; // size of the tree rooted at this node
cnt = 1; // number of same value of v
ch = new Node[2]; // ch[0] -> left | ch[1] -> right
node_cnt = 1; // number of nodes in the tree
}
}
```

- Insert: $O(\log N)$
- Rotate: $O(1)$
- Delete: $O(\log N)$
- Update: $O(\log N)$
- Query: $O(\log N)$

**Query operation includes existence, ranking, etc.**

**If you are only interested in how the structure is implemented, you can generally skip this part**

Since Treap depends on randomization, we can calculate the expected depth of a node to prove the complexity.

The depth of a node $i$ depends on how many ancestors it has. So we introduce a notation $A_{j\to i}$ denote that $j$ is an ancestor of $i$, and then the expected depth of node $i$ is:

$$E[depth_i]=\sum_{j=1}^{n} E[A_{j \to i}]$$

To analyze $A_{j\to i}$, we can break up three conditions between the relationship of node $j$ and node $i$ according to the keys in nodes.

(Let $w_x$ denotes the weight(key) of node $x$, $\mathbb{A_i^j}$ denotes nodes in the link $i\to j$ excluding $i$ and $j$)

- $\forall w_x \in \mathbb{T_w}, w_i \lt w_x$
- $\exists w_k \in \mathbb{A_i^j}$ such that $\forall w_x \in \mathbb{T_w}, w_k\lt w_x$
- $\forall w_x \in \mathbb{T_w}, w_j \lt w_x$

For the first case, $w_i$ is the global minimum key in Treap; therefore, there must be no ancestor for node $i$ (a node $i$ has ancestor(s) if and only if there is such a node $j$ that $w_j \lt w_i$).

For the second case, it is obvious that it is impossible since if there is a node $k$ in the link $i\to j$ has the global minimum, and node $j$ is an ancestor of node $k$ and node $i$, node $j$ would have a smaller key than that of node $k$. However, this is contradictory.

For the last one, since $j$ has the global minimum(that is, the highest priority because we are maintaining a min-root heap), $j$ can be an ancestor of $i$. In this case, we say $A_{j\to i}=1$. And because the key of nodes are randomly assigned, during a round of insertion, the probability of node $j$ is the ancestor of $i$ is $\frac{1}{|j-i|+1}$ (because of the randomized assignment of keys, any single node in the link $i\to j$ may be rotated to be the root that contains the global minimum). Then we can get: $E[A_{j\to i}]=\frac{1}{|j-i|+1}$. We sum up all nodes to calculate the expected depth:

$$ \begin{matrix} E[depth_i] & = & \sum_{j=1, j\neq i}^{n} E[A_{j\to i}] \\\\ & = & \sum_{j=1}^{i-1}\frac{1}{i-j+1} + \sum_{j=i+1}^{n}\frac{1}{j-i+1} \\\\ & \lt 2\ln n \\\\ & = O(\log n) \end{matrix} $$

(Harmonic number: $a_i=\sum_i^n \frac{1}{i}$ has a bound of $\ln n \lt a_i \lt ln+1$)

And therefore, the expected depth of a node $i$ is $\log n$ and the tree depth can always be maintained such that its maximum depth does not exceed $\log n$.

Since the height of the tree does not exceed $\log n$, inserting and deleting a node is $\log n$ operation. (Recall that the time complexity of rotating is $O(1)$ and there are at most $\log n$ rotations each time of an operation).

The only thing we have to deal with when we modify the tree(which is basically adding and deleting nodes) is the size of the subtrees. So we only care about $size$ and $cnt$, since these two fields relate to the size of a (sub)tree. While maintaining, we simply add the information about the size of left subtree and right subtree to current node we are maintaining.

```
public Node maintain(Node root) {
root.size = root.cnt;
root.node_cnt = 1;
if (root.ch[0] != null) {
root.node_cnt += root.ch[0].node_cnt;
root.size += root.ch[0].size;
}
if (root.ch[1] != null) {
root.node_cnt += root.ch[1].node_cnt;
root.size += root.ch[1].size;
}
return root;
}
```

In order to maintain both properties of values(for BST) and weights(for Heap), we need to adjust positions of nodes dynamically when constructing / inserting nodes into the tree. Since a node has two children at most, we only have two operations to switch positions between nodes: **Left rotation** and **Right rotation**.

**Definition: when I say ‘left rotate a node’, it means to rotate its left child upward, and ‘right rotate’ means to rotate its right child upward.**(See the graph below)

Let’s take a look at **Left rotate**.

By analyzing the graph with the property of BST, we can get

$$ c>a>e>b>d $$

We know that $ \forall v \in \mathbb{T}_L,\ v < T_v $ . Therefore, during a round of left rotation, the root node($a$) will become the right child of its left child($b$). And the right child of $b$ will become the left child of the root node, and after rotation, the property of BST is maintained: the inequality above is still true.

Right rotation does exactly the same thing in the reversed direction.

So it can be proved that the property of BST is maintained after node rotation.

Since this operation does not affect anything(permutation / property of BST) related to values in nodes, we can use it to maintain keys of nodes. We need to make sure that by just looking at randomly generated keys of the nodes, they form a min-root heap.

You will find that the sequence of traverse of the tree before & after rotation is the same.

For instance, given a tree shown in the graph below:

All key values are labeled.

Suppose we inserted the node filled with black color, whose key value is 4. Apparently, current structure of keys in nodes of the tree violates the property of min-root heap, since the key of inserted node is less than the root node(which is also its parent). The solution is to apply a **Left rotation** on the root node whose key value is 10, and then the tree will be like:

This is not a perfect example since the height of the tree before rotating has already been optimum, but in general, the rotation operation can prevent forming a linear link during inserting new nodes. A more practical example will be given later in the article.

```
public Node rotate(Node root, int d) {
Node tmp = root.ch[d];
root.ch[d] = tmp.ch[d ^ 1];
tmp.ch[d ^ 1] = root;
maintain(root);
tmp = maintain(tmp);
return tmp;
}
```

Note that even though we have two operations, we can implement it by writting only one method, since they are each other’s inverse operation. Then we can use $d$ to identify lift left or right children up to the root’s position. Setting $d=0$, it will do Left rotation, and $d=1$ for right rotation.

(*To fully understand this part, you need to first get familiar with how BST works when inserting a new node into it*)

Let’s first list out what may / will be changed if we insert a new node:

- Number of nodes along the path while inserting the node (tree size)
- Number of same value counter in a node (inserting an existing value)
- Structure of a subtree (due to rotation)

Notice that each time we insert a new node, the value is either in the tree or a new value(does not in the tree).

- For inserting a value that has already been inserted in the tree, we can simple let the same value counter of the node contains the value plus 1.
- For inserting a new value that does not in the tree, due to the functionality of BST, we will reach a empty node(null). We can simply construct a new one and set it to be a child of the node in one level upper.

In the second case, we may need to adjust the structure of the tree since the randomly generated keys(weight) of new nodes are unknown, so we need to perform node rotation on its parent node if the key of new node is less than the key of its parent. As mentioned in previous section, the structure before and after rotation does not violates the property of BST, so we can possibly perform multiple rotations even though we only inserted one new node.

**Notice** that insering is a recursive process, and all nodes along the path of finding the right place should be updated(subtree size). *This is intuitive*.

Given a tree shown in the graph below, and we are to insert value 8.

First we do basic inserting operation of a BST, which is to compare the value to insert($v_i$) to the value of current node($v_c$).

- If $v_i < v_c$, go to the root of left subtree
- otherwise, go to the root of right subtree.

Until:

- Current node is empty, which means $v_i$ is not in the tree.
- $v_c=v_i$.

These are two conditions mentioned previously, and this is the base cases of inserting. For the first condition, it’s relatively easy to handle: just add 1 to the same value counter to the node(`node.cnt`

).

If the value is totaly new to the tree, we need to construct a new node and generate a key for it in order to making the tree balanced.

Finally we got the right position for the new node, which is the right child of Node(4, 58). Then returning to its parent, we found that the key(58) is greater than that of the new node, so we do right rotation.

Continuing returning to the upper level, we, again, found the key of Node(3, 21), which is 21, is still greater than that of the new node, so we do right rotation again:

Back to the root node, the key of the root is also greater than that of the new node, and currently, the new node is the left child of the root, so we do left rotation to lift it up to the root:

Finally, we finished the operation and the tree remains balanced:

```
private Node _insert(Node root, int v) {
if (root == null) {
// Construct a new node and generate a random key
return new Node(v, new Random().nextInt(1 << 30) % 6662333);
} else {
root.size += 1; // update the size of subtree information along the path
if (root.v == v) {
// condition 1: inserting an exisiting value
root.cnt += 1;
} else {
int d = v > root.v ? 1 : 0; // determine which side to go(basically it is the BST operation)
root.ch[d] = _insert(root.ch[d], v);
if (root.ch[d].rnd < root.rnd) {
// If the key of current node is greater than the inserted one
root = rotate(root, d);
}
}
return root;
}
}
```

The deleting operation is a kind like the reverse process of inserting a new node. While adding a new node, we find its position and add it in, and then we adjust it to the right position so that the property of Treap is maintained. If we do this operation in a reversed direction by rotating a node to the position where it becomes a leaf node, we can remove it anyway without bothering any other nodes in the tree.

It is true that this approach works fine, but it is a part of a bigger picture. Now let’s consider under what situation can we remove a node from the tree, and we can see that being a leaf is one condition, the other condition also fits our need is that a node is in a link node, which means it only has one child and the other is empty node(termination node).

Then we get the algorithm of deleting a node: rotate it down until it becomes a leaf or it becomes a link node, and then **remove** it. It is obvious that the property of Treap should not be violated after the operation, so let’s see how can we do this my maintaining the property of values and keys in the tree.

(*The detail of remove operation will be introduced later*)

Rotating a node down is equal to rotating a child of the node up, which means it is fine for us to use the operation we defined early(**rotate**), as we proved, the property of BST maintained before and after the operation, so we do not need to worry about mess the values up during rotating downward.

But it is a different story for the keys, ~~since the direction to rotate is decided by us, so it can possibly be incorrect if we do the rotation in a wrong way. ~~; however, it is very obvious that we should choose the child that has the less key to rotate(*It is like maintaining a Heap*).

As discussed above, when the node reaches a proper position where it is a link node or it is a leaf node, we can remove it from the tree. When it is at a link position, we can directly use the child of the node to replace its position, otherwise, we can directly set it to $null$.

Just use the tree we got in the Inserting Nodes part, and we are to delete the node inserted(Node(8, 9))

First we need to locate the node we want to delete. The information of the node to be deleted, often, is the value of the node. By using the property of BST, we can easily find out the position by doing left-right traversal(if current value is greater than the value we are to remove, the go to the left subtree to find it, otherwise if current value is less than the value we are to delete, then go to right subtree.) if current node does not have the value we are to delete.

But in this case, we got it without doing the steps above since the node is the root and we found that the value is exactly what we want to delete. Thus it is not a perfect example, since it does not show how we find the right position of the value, but (I think) it is harder to understand the second part but not the first one.

Then we found that there are two childs in the current node, so we rotate it down to the son having the smaller key in order to maintain the property of heap, and we got Node(10, 10) has a smaller key than Node(3, 21), and we perform a right rotation:

Continuing the process, we found that there is only one child now in the node we are to delete, so we can use its son to subtitute current node:

And the node is deleted.

```
private Node _delete(Node root, int v) {
if (root != null) {
if (root.v == v) {
if (root.cnt > 1) {
root.cnt -= 1;
return root;
}
Node lchild = root.ch[0];
Node rchild = root.ch[1];
if (lchild == null || rchild == null) {
root = lchild == null ? rchild : lchild;
return root;
}
int d = lchild.rnd < rchild.rnd ? 0 : 1;
root = rotate(root, d);
root = _delete(root, v);
return root;
} else {
root.size -= 1;
int d = v > root.v ? 1 : 0;
root.ch[d] = _delete(root.ch[d], v);
return root;
}
} else {
return null;
}
}
```

So far, we have covered algorithms for inserting a node into treap and deleting one from the tree, and all these operations does not affect the properties of the data structure. This is emphasized several times since keeping the property allows us to do lots of things.

First, according to the property of BST, we can maintain a non-decreasing sequence by using the structure. As mentioned in the rotaion section, the order sequence of the tree before and after rounds of rotations does not change. And, because of the property of BST, the order traversal sequence will always be non-decreasing, so if we can maintain it each time we insert a new value, the sequence will also be non-decreasing after inserting, and the complexity is $O(\log n)$, which is the same as finding a value using binary search.

That is a little bit *unimportant* since we can use a real Heap to maintain an ordered sequence rather than writing a G.D.F rotating nodes in a tree to do it. But there are things can be done much more efficiently by using Treap rather than using a Heap. The following sections will introduce several useful utilization of Treap structure.

Consider a problem:

Given a list of students’ midterm score, the teacher isn’t interested in the highest and the lowest score, but she wants to know by getting how much score can a student place on the $k^{th}$ lowest score. The teacher is impatient, so she maybe ask you about the answer anytime before or after getting a new score from the grader. Implement a data structure to solve the problem.

To be simple, the problem ask us to use a data structure to find the $k^{th}$ smallest number in a sequence, and there might be new values added into the sequence between two queries.

Instead of go to the optimal answer directly, let’s see how we can solve the problem.

The first intuition is to sort the list and get the value at the $k^{th}$ position. Well, it works fine, and the complexity is $O(n\log n)$, the query complexiy is $O(1)$. But if a query is after round(s) of inserting new scores, the query complexity will be $O(n\log n)$. This is not quite acceptable, since if there are multiple insertions and $Q$ queries, the complexity will be $O(Q\cdot n\log n)$, even though the inserting operation is $O(1)$.

Alright, we see that the time consumption can be really large for the naive approach. So now we have Treap, and we can solve the problem very fast!

Recall that we record the size of the subtree of each node, and we store the number of same values in each node. Let $L_i$ and $R_i$ denotes the left and right child of node $i$, and $c_i$ denotes the number of same values in node $i$, and $size_i$ denotes the size of the subtree of node $i$. Then we can get:

$$ size_i = c_i + size_{L_i} + size_{R_i} $$

(**Intuitive right? :D**)

With this information and the property of BST, we can know that: for a node $i$ in a left subtree, the rank of the value in the node equals $size_{L_i} + c_i + 1$ and for a node $i$ in a right subtree, the rank of the value in the node is $size_{fa_i} - size_{i} + 1$

And now think about querying the $k^{th}$ value, we can compare it with the size of current tree, which results in three conditions:

- $k \leq size_{L_i}$
- $size_{L_i} \lt k \leq size_{i} - size_{L_i}$ (or $size_{L_i} \lt k \leq c_i + size_{L_i}$)
- Otherwise

**Condition 1**

$k$ is less or equal to the size of left subtree, then we need to find the answer in the left subtree of current node because the value of current node is out of the bound of rank $k$.

**Condition 2**

If $k \in [ size_{L_i} + 1, size_{L_i} + c_i ]$, we can confirm that the $k^{th}$ value is the value of node $i$. (Note that there can be multiple same values in the sequence).

**Finally**

The lasting condition is $k > size_{L_i} + c_i$, so the $k^{th}$ value falls somewhere in the right subtree of current node. Since it is impossible to calculate number of nodes above a certain node, we can convert the problem to *calculating how many nodes we still need to pass by to get the answer*. That is, instead of finding the $k^{th}$ value, we find a value such that there are $N-k$ values lies behind it. This calculation is equal to finding the value ranked at the $k^{th}$ position.

```
private int _kth(Node root, int k) {
if (root == null) {
return -(1 << 31);
} else {
if(k == 0) return root.v;
if (k <= getSize(root.ch[0])) return _kth(root.ch[0], k);
else if (k <= getSize(root.ch[0]) + root.cnt) return root.v;
else {
return _kth(root.ch[1], k - getSize(root.ch[0]) - root.cnt);
}
}
}
```

**This is the inverse operation of that we discussed in the section above**

Using the infromation we maintain, it is possible to figure out the number that is ranked the $k^{th}$. Since the operation is very similar to the $k^{th}$ number querying, I’ll not cover the implementation in detail and left most of approaches to you XD.

The only thing I want to point out is that *think about how to figure out what rank a number has if it is in a node that is in the right subtree*.

To check whether your function implementation is correct, you can try $kth(F(x))$. The value should exactly be $x$, the number you entered into your function.

- Answer for the question: checkout
`rank`

method at the end of the article.

You can also implement a method like those of functions in C++ Standard Template Library. *lower_bound* takes a parameter $x$ and find out where is the smallest number that is greater than $x$, and *upper_bound* takes a parameter $x$ and find out where is the largest number that is smaller than $x$. Apparently you can do this using binary search, but the sequence should be sorted. On Treap, the data is somehow sorted so you can also find out **what** the number is by calling *lower_bound* and *upper_bound* on Treap.

- Answer for the question: checkout
`pre`

for upper_bound and`post`

for lower_bound at the end of the article.

```
import java.util.*;
public class Treap {
public class Node {
public int v, rnd, size, cnt, node_cnt;
public Node[] ch;
public Node(int data, int w) {
v = data;
rnd = w;
size = 1;
cnt = 1;
ch = new Node[2];
node_cnt = 1;
}
public String toString() {
String result = "v: " + this.v;
if (this.ch[0] != null) {
result += " " + this.ch[0].v;
}
if (this.ch[1] != null) {
result += " " + this.ch[1].v;
}
return result;
}
}
public Node rt;
public Node maintain(Node root) {
root.size = root.cnt;
root.node_cnt = 1;
if (root.ch[0] != null) {
root.node_cnt += root.ch[0].node_cnt;
root.size += root.ch[0].size;
}
if (root.ch[1] != null) {
root.node_cnt += root.ch[1].node_cnt;
root.size += root.ch[1].size;
}
return root;
}
public Node rotate(Node root, int d) {
Node tmp = root.ch[d];
root.ch[d] = tmp.ch[d ^ 1];
tmp.ch[d ^ 1] = root;
maintain(root);
tmp = maintain(tmp);
return tmp;
}
public void insert(int data) {
rt = _insert(rt, data);
}
public void printTree() {
_printTree(this.rt);
}
public void delete(int v) {
this.rt = _delete(this.rt, v);
}
public int kth(int k) {
return _kth(this.rt, k);
}
public int rank(int v) {
return _rank(this.rt, v);
}
public boolean contains(int v) {
return _contains(this.rt, v);
}
public int getCount(int v) {
return _getCount(this.rt, v);
}
public int pre(int v) {
return _pre(this.rt, v);
}
public int post(int v) {
return _post(this.rt, v);
}
public int size() {
return getSize(this.rt);
}
public int count() {
if (rt == null) {
return 0;
} else {
return rt.node_cnt;
}
}
private int _getCount(Node root, int v) {
if (root == null) {
return 0;
} else {
if (root.v == v) {
return root.cnt;
} else {
int d = v > root.v ? 1 : 0;
return _getCount(root.ch[d], v);
}
}
}
private int _post(Node root, int v) {
if (root == null) {
return (1 << 30);
} else {
if (v >= root.v) {
return _post(root.ch[1], v);
} else {
return Integer.min(root.v, _post(root.ch[0], v));
}
}
}
private int _pre(Node root, int v) {
if (root == null) {
return -(1 << 30);
} else {
if (v <= root.v) {
return _pre(root.ch[0], v);
} else {
return Integer.max(root.v, _pre(root.ch[1], v));
}
}
}
private boolean _contains(Node root, int v) {
if (root == null) {
return false;
} else {
if (root.v == v) {
return true;
}
int d = v > root.v ? 1 : 0;
return _contains(root.ch[d], v);
}
}
private int _rank(Node root, int v) {
if (root == null) {
return -(1 << 31);
} else {
if (v < root.v) {
return _rank(root.ch[0], v);
} else if (v == root.v) {
return getSize(root.ch[0]) + 1;
} else {
return getSize(root.ch[0]) + root.cnt + _rank(root.ch[1], v);
}
}
}
private int getSize(Node x) {
if (x == null) {
return 0;
}
return x.size;
}
private int _kth(Node root, int k) {
if (root == null) {
return -(1 << 31);
} else {
if(k == 0) return root.v;
if (k <= getSize(root.ch[0])) return _kth(root.ch[0], k);
else if (k <= getSize(root.ch[0]) + root.cnt) return root.v;
else {
return _kth(root.ch[1], k - getSize(root.ch[0]) - root.cnt);
}
}
}
private void _printTree(Node root) {
if (root != null) {
_printTree(root.ch[0]);
System.out.print(root.v + " ");
_printTree(root.ch[1]);
}
}
private Node _insert(Node root, int v) {
if (root == null) {
return new Node(v, new Random().nextInt(1 << 30) % 6662333);
} else {
root.size += 1;
if (root.v == v) {
root.cnt += 1;
} else {
int d = v > root.v ? 1 : 0;
root.ch[d] = _insert(root.ch[d], v);
if (root.ch[d].rnd < root.rnd) {
root = rotate(root, d);
}
}
return root;
}
}
private Node _delete(Node root, int v) {
if (root != null) {
if (root.v == v) {
if (root.cnt > 1) {
root.cnt -= 1;
return root;
}
Node lchild = root.ch[0];
Node rchild = root.ch[1];
if (lchild == null || rchild == null) {
root = lchild == null ? rchild : lchild;
return root;
}
int d = lchild.rnd < rchild.rnd ? 0 : 1;
root = rotate(root, d);
root = _delete(root, v);
return root;
} else {
root.size -= 1;
int d = v > root.v ? 1 : 0;
root.ch[d] = _delete(root.ch[d], v);
return root;
}
} else {
return null;
}
}
}
```

In this article, we discussed the idea of balancing a binary search tree and methods we can use to continuously maintain the Treap structure. Also, we covered several operations we can perform on Treap: Insert, Delete and Query the Kth number. It is a useful $\log N$ data structure.

There is also a kind of implementation of Treap called FHQ Treap, which is the same data structure without rotate operation. It is more intuitive under a functional context, you can check it out at HERE.

[[1]][Parallel and Sequential Data Structures and Algorithms, 15-210 (Spring 2012)] [Carnegie Mellon University]: http://www.cs.cmu.edu/afs/cs/academic/class/15210-s12/www/lectures/lecture16.pdf “Lecture 16 — Treaps; Augmented BSTs”