Binary Search Tree

April 12, 2016

A binary search tree, BST, is a fundamental dynamic data structure used almost everywhere in science computer. Due to the efficiency of the retrieving and storing element, it's espacially used in storing systems.

A tree is composed of nodes, edges and leaves. Each node is composed of a key value and possibly some associated data and is attached to two others nodes. A leaf can be seen as an empty node. The graphical representation is an upside down tree where the beginning is the topmost node, called the root.

Some definitions

Root - The top node in a tree.

Child - A node directly connected to another node when moving away from the Root.

Parent - The converse notion of a child.

Siblings - A group of nodes with the same parent.

Descendant - A node reachable by repeated proceeding from parent to child.

Ancestor - A node reachable by repeated proceeding from child to parent.

Leaf - A node with no children.

Internal node - A node with at least one child.

Path - A path is a sequence of nodes and edges connecting a node to a descendant.

Height - The height of a node is the number of edges on the longest downward between that node and a leaf.

Depth - The depth of a node is the number of edges from the node to the tree's root node.

Level - The level of a node is defined by 1 + the number of connections between the node and the root.

Degenerate tree - A tree where for each parent node, there is only one associated child node.

Balanced tree - A tree where leaves' heights differ by at most one.


The BST is a tree ruled by some characteristics: it keeps its keys in a certain order. Here is the specific rules:

For every node N in the tree

  • all keys in N's left subtree are less than the key in N
  • all keys in N's right subtree are greater than the key in N

figure 0
Figure 1 - A correct Binary Search Tree

figure 1
Figure 2 - Not a BST because 5 in the right subtree of 6 is lower.

As you can see, this is a recursive definition. Hence, if we take a node, for instance 3 and all its descendants, it's a new smaller BST which fulfills the same rules. Due to this fact, functions can be applied recursively.

As all keys are sorted, when looking for a key in the tree, only one branch has to be explored. At every node, a comparison will be performed to know if the key is in the right or left subtree. On average, for every step, we skip half of the tree. Lookup, insertion, deletion have a complexity proportional to the logarithm of the number of items in the tree. The tree is not rearranged when a key is inserted or deleted, and that's why it may end up unbalanced.

figure 2
Figure 3 - Worst case: a degenerate tree

Average Worst case
Space O(n) O(n)
Search O(log n) O(n)
Insert O(log n) O(n)
Delete O(log n) O(n)

Advantages

The reason why binary-search trees are used is that the following operations can be implemented efficiently using a BST when speaking of a sorted data structure.

  • insert a key value
  • determine whether a key value is in the tree
  • remove a key value from the tree
  • inorder traversial, for example: print all of the key values in sorted order

Disadvantages

It's unlikely that the tree will be properly balanced as it depends on the order of insertions and deletions.

  • the shape of the tree depends on insertion and deletion order
  • the insertions or deletions need to compare each key of visited node with the key of the element to be inserted/deleted

Implementation

Structure

The heart of the BSTs is in the node's structure. For the simplicity, we didn't associate data in the node but you could add anything.

struct node {
  int key;
  node *left;  // begin of the left subtree
  node *right; // begin of the right subtree
  node(int k): key(k) { // constructor
    left = NULL;
    right = NULL;
  }
};

Note that if left and right pointers are not set to NULL, the condition left == NULL will not be triggered. That's why a constructor was added to force these assignments.

Class

class binaryTree {
public:
  binaryTree();
  ~binaryTree();

  void insert(int key);
  bool search(int key);
  void destroyTree();
  int countNode();
  void print();

private:
  void destroyTree(node *n);
  void insert(int key, node *n);
  bool search(int key, node *n);
  int countNodes(node *n);
  void print(node *n);

  node *root;
};

Destroy tree

void binaryTree::destroyTree(node *n) {
  if (n != NULL) {
    destroyTree(n->right);
    destroyTree(n->left);
    delete n;
  }
}

void binaryTree::destroyTree() {
  destroyTree(root);
}

binaryTree::~binaryTree() {
  destroyTree();
}

destroyTree begins from the bottom of the tree and goes up. It frees all the tree's used memory.

Insertion

void binaryTree::insert(int key, node *&n) {
  if (n == NULL) {
    n = new node(key);
    return;
  }
  if (n->key < key) {
    insert(key, n->left);
  } else {
    insert(key, n->right);
  }
}

void binaryTree::insert(int key) {
  insert(key, root);
}

Search

bool binaryTree::search(int key, node *n) {
  if (n == NULL) {
    return false;
  }

  if (n->key == key) {
    return true;
  }

  return search(key, n->left) || search(key, n->right);
}

bool binaryTree::search(int key) {
  return search(key, root);
}

Count nodes

int binaryTree::countNodes(node *n) {
  if (n == NULL) {
    return 0;
  }
  return 1 + countNodes(n->left) + countNodes(n->right);
}

int binaryTree::countNodes() {
  return countNodes(root);
}

Conclusion

How could we improve efficienty of BST?

To improve efficienty of BST, we have to keep the tree balanced. For every insertion and deletion operations, we could rotate the nodes in order to keep the same height on the left and right subtree. Moreover, to reduce time comparison of each visited node, we could use radix tree.

Next post: self-balancing binary search tree.


References:

  1. Binary search tree - wikipedia
  2. Binary Search Trees - algs4.cs.princeton.edu
  3. Binary Trees in C++: Part 1 - cprogramming.com
  4. Binary Trees - cslibrary.stanford.edu
  5. Algorithms and Data Structures - algolist.net