Disjoint Sets: Union/Find Data Structures | Study notes Data Structures and Algorithms

11/12/10

18:57:57 1

33

CS61B: Lecture 33

Friday, November 12, 2010

Today’s reading: Goodrich & Tamassia, Section 11.4.

DISJOINT SETS

=============

A _disjoint_sets_ data structure represents a collection of sets that are

_disjoint_: that is, no item is found in more than one set. The collection of

disjoint sets is called a _partition_, because the items are partitioned among

the sets.

Moreover, we work with a _universe_ of items. The universe is made up of all

of the items that can be a member of a set. Every item is a member of exactly

one set.

For example, suppose the items in our universe are corporations that still

exist today or were acquired by other corporations. Our sets are corporations

that still exist under their own name. For instance, "Microsoft,"

"Forethought," and "Web TV" are all members of the "Microsoft" set.

We will limit ourselves to two operations. The first is called a _union_

operation, in which we merge two sets into one. The second is called a _find_

query, in which we ask a question like, "What corporation does Web TV belong to

today?" More generally, a "find" query takes an item and tells us which set it

is in. We will not support operations that break a set up into two or more

sets (not quickly, anyway). Data structures designed to support these

operations are called _partition_ or _union/find_ data structures.

Applications of union/find data structures include maze generation (which

you’ll do in Homework 9) and Kruskal’s algorithm for computing the minimum

spanning tree of a graph (which you’ll implement in Project 3).

Union/find data structures begin with every item in a separate set.

-------------- ------------ -------- ------------------- -------- -----------

-------------- ------------ -------- ------------------- -------- -----------

The query "find(Empire Air)" returns "Empire Air". Suppose we take the union

of Piedmont Air and Empire Air and called the resulting corporation Piedmont

Air. Similarly, we unite Microsoft with Web TV and US Air with Pacific SW.

-------------- ------------------- -----------

-------------- ------------------- -----------

The query "find(Empire Air)" now returns "Piedmont Air". Suppose we further

unite US Air with Piedmont Air.

-------------------------------- -----------

| US Air Piedmont Air| |Microsoft|

|Pacific Southwest Empire Air | | Web TV |

-------------------------------- -----------

The query "find(Empire Air)" now returns "US Air". When Microsoft takes over

US Air, everything will be in one set and no further mergers will be possible.

List-Based Disjoint Sets and the Quick-Find Algorithm

-----------------------------------------------------

The obvious data structure for disjoint sets looks like this.

- Each set references a list of the items in that set.

- Each item references the set that contains it.

With this data structure, find operations take O(1) time; hence, we say that

list-based disjoint sets use the _quick-find_ algorithm. However, union

operations are slow, because when two sets are united, we must walk through one

set and relabel all the items so that they reference the other set.

Time prevents us from analyzing this algorithm in detail (but see Goodrich and

Tamassia, Section 11.4.3). Instead, let’s move on to the less obvious but

flatly superior _quick-union_ algorithm.

Tree-Based Disjoint Sets and the Quick-Union Algorithm

------------------------------------------------------

In tree-based disjoint sets, union operations take O(1) time, but find

operations are slower. However, for any sequence of union and find operations,

the quick-union algorithm is faster overall than the quick-find algorithm.

To support fast unions, each set is stored as a general tree. The quick-union

data structure comprises a _forest_ (a collection of trees), in which each

item is initially the root of its own tree; then trees are merged by union

operations. The quick-union data structure is simpler than the general tree

structures you have studied so far, because there are no child or sibling

references. Every node knows only its parent, and you can only walk up the

tree. The true identity of each set is recorded at its root.

Union is a simple O(1) time operation: we simply make the root of one set

become a child of the root of the other set. For example, when we form the

union of US Air and Piedmont Air:

US Air

Piedmont Air US Air ^ ^

^ ^ | |

| | Piedmont Air Pacific Southwest

Empire Air Pacific Southwest ====> ^

|

Empire Air

US Air becomes a set containing four members. However, finding the set to

which a given item belongs is not a constant-time operation.

The find operation is performed by following the chain of parent references

from an item to the root of its tree. For example, find(Empire Air) will

follow the path of references until it reaches US Air. The cost of this

operation is proportional to the item’s depth in the tree.

These are the basic union and find algorithms, but we’ll consider two

optimizations that make finds faster. One strategy, called union-by-size,

helps the union operation to build shorter trees. The second strategy, called

path compression, gives the find operation the power to shorten trees.

_Union-by-size_ is a strategy to keep items from getting too deep by uniting

sets intelligently. At each root, we record the size of its tree (i.e. the

number of nodes in the tree). When we unite two trees, we make the smaller one

a subtree of the larger one (breaking ties arbitrarily).

Disjoint Sets: Union/Find Data Structures, Study notes of Data Structures and Algorithms

Related documents

Partial preview of the text

Download Disjoint Sets: Union/Find Data Structures and more Study notes Data Structures and Algorithms in PDF only on Docsity!

11/12/1018:57:^

^^ ^

^^ ^^

|^ |

|^ |^

11/12/1018:57:^

/^ ^ /|^

|^ |^

_ 0 _

/ /|\ \

/| |\

/|^