



Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of binary search trees (bsts), focusing on avl trees, a self-adjusting data structure that ensures tree density and faster access. The properties of avl trees, their comparison to binary search trees, and their performance. It also mentions other self-adjusting trees and their sources.
Typology: Exams
1 / 7
This page cannot be seen from the preview
Don't miss anything!




Timothy Rolfe Computer Science Department Eastern Washington University Cheney, WA 99004- [Dr. Dobb’s Journal, Vol. 25, No. 12 (Dec. 2000), pp. 149-52, 2000 by CMP Media Inc.] The binary search tree (BST) is a data structure for holding information that allows rapid access according to some ordering key. As the name indicates, it is a tree structure in which the nodes containing information can be connected to two subtrees (hence the “binary”). The search condition is met by requiring that all information to the left of a node comes before the node itself, and that all information to the right of the node comes after the node itself. If you want to allow equal comparisons, you choose which side to put them on. Searching for an item within such a data structure is simply a matter of examining each node (starting from the beginning, called the “root” of the tree). If the value desired is found within that node, you’re finished. If not, you can see how your desired value compares with the node value: If the desired value is less than the node value, you can ignore the entire right subtree and just move down the left subtree, repeating the same operation. If the desired is greater than the node, then ignore the left subtree and move down the right. Insertion into a BST involves searching for the value to be inserted, then putting it in exactly at the spot where it wasn’t found. Deletion is a bit more of a problem. The more elegant solution is to find a replacement node from farther down in the tree that can replace the deleted node and still retain the BST condition. In other words, replace the node with a node in the tree that, in the key order generating the tree, comes immediately before or after that node. If the BST is most densely organized, you can, within a path of length k , access one piece of information among 2 k + pieces of information. The extremely careful would want that written as 2 k +
People who want faster access have devised schemes that guarantee that the BST is denser than that (which would certainly avoid the degenerate tree case), regardless of the order of operations on the tree. The first one was proposed by Adel’son-Vel’skii and Landis, and is called the “AVL tree” in their honor (see G. M. Adel’son-Velskii and E. M. Landis, “An Algorithm For The Organization Of Information,” Soviet Math. Dockl. , 1962). Every node in a tree has what is called its “height” — the number of links in the longest path from that node down to what is called a “leaf node” (one with no children) at the bottom of the tree. (A memory aid for this is to remember that “height” tells you how far up you are.) The ALV condition for a BST is that, for every node, the heights of its two children differ at most by one. (You can see that you have to allow a difference of one just by looking at a two-node tree.) The operations in a BST that affect the height are insertion and deletion. I will focus on deletion as effectively removing the node that becomes the replacement for the deleted node. On an insertion, there are obviously no nodes below the point of insertion since there’s nothing there. With the way we are focusing for deletion, there are also no changes in height for nodes in the subtree below the node removed. Consequently, the adjustments to enforce the AVL condition must be performed in the nodes from the point of the operation back up to the root of the tree. This might appear to be a problem, but it really isn’t. To see why, we will have to get into explicit programming, and we will use the C++ language as its basis (C++ because I depend on having a pointer as a reference parameter). I will explicitly discuss insertion, but include also the deletion function in the accompanying code. Insertion is most easily handled as a recursive function: recurse to the point of failure, and then change the pointer (and, because of the reference parameter, that pointer in the calling routine) to reference the newly created node. Written for a BST without regard to the AVL condition, this function (see Listing One) is particularly simple. Assume a class BST for the tree itself, a class or struct BSTnode for the nodes of the tree, and a class , struct , or type Data for the data portion of the BSTnode (for which the relational operator “<” is defined). To keep the logic simple, I’ll omit validation of the pointer returned by the new operator. Listing One void BST::Insert( BSTnode* &Node, const Data &Value ) { if ( Node == NULL ) // failure point --- recursion base case Node = new BSTnode(Value); // change Node (ref. parameter) else if ( Value < Node->Value ) Insert(Node->Left, Value); // recurse left else // equal keys go right Insert(Node->Right, Value); // recurse right } Because of the recursion, the entire path back up to the root of the tree is available as the function goes through its returns. Thus, you can easily do the correction for the AVL condition simply by putting the appropriate code after the if/else-if/else structure in Listing One. Obviously, you need to get at the height of each tree node easily for an AVL tree. This means that you will have a small expense in space within each AVL node to retain that information.
Listing Two void AVL::RotateRight(BasePtr &Rt) // Rotate rightward --- right node moves down, left node moves up. { BasePtr Lt = Rt->Left, Q = Lt->Right; Lt->Right= Rt; Rt->Left = Q; HtAdjust (Lt); // Shifting Q may well have changed Lt and Rt heights HtAdjust (Rt); Rt = Lt; // Change the link itself that was passed by reference } There is a problem, though, with the subtree Q. As you can see in Figure 1, it remains at the same level (even though it switches sides). If the height of the leftward subtree is based on Q (that is, Q’s height is one more than R’s) you end up not correcting the AVL condition. You can, however, do preprocessing to insure that the leftward side is the longer by doing a leftward rotation at Lt before doing the rightward rotation at Rt. You have enough information now to build the function AVLadjust (see Listing Three), and then insert a call to it at the end of the Insert function; see Listing One. Note that the Rotate functions adjust heights after the rotation has been accomplished — something also checked whether there is a rotation or not. Listing Three void AVL::AVLadjust ( AVLnode* &Node ) {//We presume the AVL class has access to AVLnode's Height data member int Lht = Node->Left == NULL? -1 : Node->Left->Height, Rht = Node->Right == NULL? -1 : Node->Right->Height, Diff = Lht - Rht; if ( abs(Diff) < 2 ) // AVL condition is met HtAdjust(Node); // May need to adjust the height anyway else if ( Lht > Rht )// Left side two longer than right { int Lck = Node->Left->Left == NULL? -1 : Node->Left->Left->Height, Rck = Node->Left->Right == NULL? -1 : Node->Left->Right->Height; if ( Lck < Rck ) RotateLeft (Node->Left); // Make left the longer RotateRight (Node); // Adjust Node itself HtAdjust(Node); // Update current node's height } else // Mirror image logic to that above: Right is two longer than left { int Lck = Node->Right->Left == NULL? -1 : Node->Right->Left->Height, Rck = Node->Right->Right == NULL? -1 : Node->Right->Right->Height; if ( Lck > Rck ) RotateRight (Node->Right); RotateLeft (Node); HtAdjust(Node); } } The AVL tree has an interesting property. In its worst case (that is, the AVL tree in which every node has subtrees that differ in height by one), the number of nodes in the entire tree, and within
each subtree that comprises the entire tree, is close to a Fibonacci number. (The recurrence is actually H k +1 = H k + H k –1 + 1.) On this basis, there is a proof that the worst-case height (the largest number of links that must be traversed in a search) is less than 1.44 lg ( N +2) – 1.328. This is a marked improvement over the average case for a BST, noted above as nearly 3 lg ( N ). The average height for AVL trees, unfortunately, has not been mathematically determined. Figure 2 combines two sources of experimental data. For AVL trees of up to 12 nodes, you can explicitly generate all possible trees (up to 12! permutations) to get the average tree height (worst-case search path length) and the average node depth (the average of the search path lengths to find a value). For trees with more than 12 nodes, the results are based on generating, for each tree size, 5000 AVL trees from random data, and then generating their averages. Besides the average height (top graph) and the average depth (bottom graph), you also see as the step function between them the height of a complete binary tree — the best-case tree in which every tree level, except possibly the bottom, is completely filled. The average tree height shows some interesting undulations for each power of two in the size of the tree. As the tree size grows, however, the ratio of the average tree height to the average node depth settles down to approximately 1.3 to 1. There is no plot of the average node depth in the complete binary tree because it is close to the average node depth of the AVL tree, lying at most about 0.2 units below the AVL tree average node depth. To the scale of the graph, the two can’t be distinguished. AVL Tree: Average Height and Average Depth 0
Number of Nodes Height or Depth Average Height Complete Tree Height Average Depth Figure Two: AVL statistics to 640 nodes Figure 3 shows the results from taking the AVL tree experimentation out to trees with 5000 nodes, again running 5000 random trees of each size. You see that tree height approaches 10
texts. The information on random binary search trees is from Robert Sedgewick and Philippe Flajolet’s An Introduction to the Analysis of Algorithms , published by Addison-Wesley in 1996. The information on the worst-case AVL tree height is from Mark Allen Weiss’s Algorithms, Data Structures, and Problem Solving with C++ , published by Addison-Wesley in 1996. This is the text for the data structures course that I am teaching at Eastern Washington University (an upper-division course). In addition to the fundamental Binary Search Tree and the AVL tree, Weiss also discusses several of the more recent self-adjusting trees (2-3 trees, red-black trees, and splay trees). Dr. Dobb’s Journal in January 1994 published an article by Bruce Schneier with the title “Skip Lists” about an alternative structure for look-ups. On-Line Resources: The file aa0012.zip^1 is available electronically (see “Resource Center,” page 5). When expanded, it generates the source code used to obtain the statistics for the average depths of AVL trees — appropriate class files and two main program files (one to generate all permutations up to a maximum tree size, the other to generate random permutations). The mains are even written (under Linux on a dual-processor machine) to generate each of these statistics in two parallel computations. In addition there is an RTF file giving a single sheet with exact figures and references for quantities whose approximate values are quoted above. (^1) http://www.ddj.com/ftp/2000/2000_12/aa0012.zip