


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The efficient implementation of disjoint set union-find, a data structure used in kruskal's algorithm for finding a minimum spanning tree. Three operations: makeset, union, and find, and their costs. The authors propose two heuristics, union by rank and path compression, to improve the performance. The analysis of the union-find algorithm is also provided, showing that the total running time to perform a sequence of operations is o((m+n) log∗ n), where log∗ n is the number of times the log function must be applied to n before the result is less than or equal to 1.
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



UC Berkeley—CS 170: Efficient Algorithms and Intractable Problems Handout 12 Lecturer: David Wagner March 11, 2003
Kruskal’s algorithm for finding a minimum spanning tree used a structure for maintaining a collection of disjoint sets. Here, we examine efficient implementations of this structure. It supports the following three operations:
We will consider how to implement this efficiently, where we measure the cost of do- ing an arbitrary sequence of m UNION and FIND operations on n initial sets created by MAKESET. The minimum possible cost would be O(m + n), i.e., cost O(1) for each call to MAKESET, UNION, or FIND. Our ultimate implementation will be nearly this cheap, and indeed be this cheap for all practical values of m and n. The simplest implementation one could imagine is to represent each set as a linked list, where we keep track of both the head and the tail. The canonical element is the tail of the list (the final element reached by following the pointers in the other list elements), and UNION simply concatenates lists. In this case FIND has maximal cost proportional to the length of the list, since following each pointer costs O(1), and UNION has cost O(1), to point the tail of one set to the head of the other. The worst case cost is attained by doing n UNIONs, to get a single set, and then m FINDs on the head of the list, for a total cost of O(mn), much larger than our target O(m + n). To do a better job, we need a more clever data structure. Let us think about how to improve the above simple one. First, instead of taking the union by concatenating lists, we simply make the tail of one list point to the tail of the other, as illustrated below. That way the maximum cost of FIND on any element of the union will have cost proportional to the maximum of the two list lengths (plus one, if both have the same length), rather than the sum.
UNION
More generally, we see that a sequence of UNIONs will result in a tree representing each set, with the root of the tree as the canonical element. To simplify coding, we will mark the root by setting the pointer in the root to point to itself. This leads to the following initial implementations of MAKESET and FIND:
procedure MAKESET(x) ... initial implementation p(x) := x
function FIND(x) ... initial implementation if x 6 = p(x) then return FIND(p(x)) else return x It is convenient to add a fourth operation LINK(x,y) where x and y are required to be two roots. LINK changes the parent pointer of one of roots, say x, and makes it point to y. It returns the root of the composite tree y. Then UNION(x,y) = LINK(FIND(x), FIND(y)). But this by itself is not enough to reduce the cost; if we are so unlucky as to make the root of the bigger tree point to the root of the smaller tree, n UNION operations can still lead to a single chain of length n, and the same cost as above. This motivates the first of our two heuristics: UNION BY RANK. This simply means that we keep track of the depth (or RANK) of each tree, and make the shorter tree point to the root of the taller tree; code is shown below. Note that if we take the UNION of two trees of the same RANK, the RANK of the UNION is one larger than the common RANK, and otherwise equal to the max of the two RANKs. This will keep the RANK of tree of n nodes from growing past O(log n), but m UNIONs and FINDs can then still cost O(mlogn).
procedure MAKESET(x) ... final implementation p(x) := x RANK(x) := 0
function LINK(x,y) if RAN K(x) > RAN K(y) then swap x and y if RAN K(x) = RAN K(y) then RAN K(y) = RAN K(y) + 1 p(x) := y return(y) The second heuristic, PATH COMPRESSION, is motivated by observing that since each FIND operation traverses a linked list of vertices on the way to the root, one could make later FIND operations cheaper by making each of these vertices point directly to the root:
function FIND(x) ... final implementation if x 6 = p(x) then p(x) := FIND(p(x)) return(p(x)) else return(x)