Trees

Trees are an important concept underlying a variety of data structures. We've mentioned them briefly before with linked lists. A tree is just what it sounds like—a tree. More formally, a tree is a hierarchical data structure that is either empty, or contains a 'root' node whose children are also trees. In plain English, a tree consists of nodes organized into a branching structure where each node can have one parent and an arbitrary number of children. (See diagram)

So, why are trees useful? Aside from modeling branching structures (say, a menu system), trees can be useful—even critical—in implementing data structures. For example, an ordered dictionary is often implemented as a self-balancing tree (which we will learn about later on). We'll go over how a tree structure allows us to implement a heap today.

Terminology

As just mentioned, the 'first' node in a tree is called the root. The root element gives access to all the others, and is hence usually what a data structure tracks.

Subsequent nodes are simply known as nodes, but the final layer (nodes without children) are known as leaves.

A tree is considered a 'binary' tree if all nodes have at most two children. A 'ternary' if every node has at most three children, etc.

A binary tree is considered 'complete' if all leaves are filled in from the left side. For example, the example would not be a complete tree because (ignoring the 3rd child of the root) second to leftmost leaf is not a child of the leftmost node in the above level.

Tree Operations

There are a few operations common throughout all types of trees. Adding and removing elements is actually not one of them, as the convention for doing so varies based on what kind of tree you're dealing with. For example, an arbitrary tree would support inserting a node anywhere, whereas a heap or a binary search tree have much more restrictive conversions.

What is common to all trees is traversal. A traversal algorithm is simply a procedure that visits each node in a specified manner. This is typically implemented recursively, but does not have to be. A traversal procedure will visit a node and call itself for each child node.

Preorder traversal first visits the current node, then its children in order. Postorder does the opposite; visiting the children then the current node. Inorder traversal calls itself for the left child, visits the current node, then visits the right child. Clearly, inorder only applies to binary trees.

Heaps

A heap is a particular type of binary tree. The basic property that defines a heap is that a parent element has a greater value than both its children. This is called a max-heap. A min-heap, on the other hand, assures every parent has value less than its children.

Heaps support two operations: inserting a new element and removing the root. Hence, you can only remove the largest (or smallest) element from a heap at any time. This makes heaps very useful for implementing priority queues, as elements with the highest "priority" will move to the top of the queue and hence the front of the queue.

Finally, a heap is a complete binary tree—adding and removing elements will always effect the rightmost leaf of the tree. This means that a heap can easily be representing as a contiguous array (see diagram). Heap operations apply to both linked-tree based heaps and array-based heaps.

Because of the direct array representation, the index of an element's parent and children can be trivially calculated:

int element_index;

// ...

int parent_index = (element_index - 1) / 2;
int child_index_1 = element_index * 2 + 1;
int child_index_2 = element_index * 2 + 2;

Heap Operations

Insertion

Inserting an element into a heap is performed as follows: the new element is added as a new rightmost leaf. The value is then bubbled up through the heap by comparing it to its parents. If the value is greater than that of its parent, swap the two. If you swapped, compare with the next parent, etc. This is called "reheap up."

Removal

Removing the root element of a heap follows a very similar process: begin by swapping the root with the rightmost leaf, then delete it. The heap should still be a complete tree, but now the element in the root is incorrect. So, bubble the element downward: if it is less than one of its children, swap it (if both, swap with the larger child). If you swapped, compare with the next set of children, etc. This is called "reheap down."