Efficient Implementation of Disjoint Set Union-Find: Kruskal's Algorithm and Cost Analysis | Study notes Data Structures and Algorithms

UC Berkeley—CS 170: Efficient Algorithms and Intractable Problems Handout 12

Lecturer: David Wagner March 11, 2003

Notes 12 for CS 170

1 Disjoint Set Union-Find

Kruskal’s algorithm for finding a minimum spanning tree used a structure for maintaining

a collection of disjoint sets. Here, we examine efficient implementations of this structure.

It supports the following three operations:

•MAKESET(x) - create a new set containing the single element x.

•UNION(x,y) - replace the two sets containing xand yby their union.

•FIND(x) - return the name of the set containing the element x. For our purposes this

will be a canonical element in the set containing x.

We will consider how to implement this efficiently, where we measure the cost of do-

ing an arbitrary sequence of mUNION and FIND operations on ninitial sets created by

MAKESET. The minimum possible cost would be O(m+n), i.e., cost O(1) for each call

to MAKESET, UNION, or FIND. Our ultimate implementation will be nearly this cheap,

and indeed be this cheap for all practical values of mand n.

The simplest implementation one could imagine is to represent each set as a linked list,

where we keep track of both the head and the tail. The canonical element is the tail of

the list (the final element reached by following the pointers in the other list elements), and

UNION simply concatenates lists. In this case FIND has maximal cost proportional to the

length of the list, since following each pointer costs O(1), and UNION has cost O(1), to

point the tail of one set to the head of the other. The worst case cost is attained by doing

nUNIONs, to get a single set, and then mFINDs on the head of the list, for a total cost

of O(mn), much larger than our target O(m+n).

To do a better job, we need a more clever data structure. Let us think about how to

improve the above simple one. First, instead of taking the union by concatenating lists, we

simply make the tail of one list point to the tail of the other, as illustrated below. That

way the maximum cost of FIND on any element of the union will have cost proportional to

the maximum of the two list lengths (plus one, if both have the same length), rather than

the sum.

UNION

More generally, we see that a sequence of UNIONs will result in a tree representing each

set, with the root of the tree as the canonical element. To simplify coding, we will mark the

root by setting the pointer in the root to point to itself. This leads to the following initial

implementations of MAKESET and FIND:

Efficient Implementation of Disjoint Set Union-Find: Kruskal's Algorithm and Cost Analysis, Study notes of Data Structures and Algorithms