Download Depth-First Search and Strong Components: Finding Strongly Connected Components in Graphs and more Exams Engineering in PDF only on Docsity!
Depth First Search and Strong Components
CMU 15-451 Spring 2003 D. Sleator
1. Introduction
Depth first search is a very useful technique for analyzing graphs. For example, it can be used to:
0 Determine the connected components of a graph.
0 Find cycles in a directed or undirected graph.
0 Find the biconnected components of an undirected graph.
0 Topologically sort a directed graph.
0 Determine if a graph is planar, and finding an embedding of it if it is.
0 Find the strong components of a directed graph.
If the graph has n vertices and m edges then depth first search can be used to solve all of these
problems in time O ( n + m), that is, linear in the size of the graph.
2. Depth First Search in Directed Graphs
We assume that the graph is represented as an adjacency structure, that is, for every vertex v there
is a set ad!(v) which is the set of vertices reachable by following one edge out of v. Let V be the
set of vertices in the graph, and let E be the set of edges. To do a depth first search we keep two
pieces of information associated with each vertex v. One is a the depth first search numbering,
num(v), and the other is mark(v), which indicates that v is currently on the recursion stack.
Here is the depth first search procedure:
i t 0
for all x E V do num(v) t 0
for all x E V do mark(v) t 0
for all x E V do
if num(z) =^0 then DFS(x)
DFS(w)
i t i + l
num(v) t 2
mark(v) t 1
for all w E adj(v) do
if num(w) = 0 then DFS(w)
else if num(w) > num(v) then
else'if m a r k ( w ) = 0 then
,else
mark(v) t 0
end DFS
[(v, w) is a tree edge J
[(v, w) is a forward edge ]
[(v,w) is a cross edge]
[(v,w) is a back edge]
This process examines all edges and vertices. The call DFS(v) is made exactly once for each vertex of the graph. Each edge is placed into exactly one of four classes by the algorithm: tree edges, forward edges, cross edges, and back edges.
This classification of the edges is not a property of the graph alone. It also depends on the
ordering of the vertices in adj(v) and on the ordering of the vertices in the loop that calls the DFS
procedure. The num and mark fields are not actually necessary to accomplish a complete search
of the graph. All that is needed to do that is a single bit for each vertex that indicates whether or
not that vertex has already been searched. (This bit is zero for vertex v if and only if num(v) = 0).
The depth first labeling (the num field) has some very useful properties that we shall make use of.
The tree edges have the property that either zero or one of them points to a given vertex.
Therefore, they define a collection of trees, called the depth-first spanning forest of the graph. The
root of each tree is the lowest numbered vertex in it (the one that was searched first). These rooted
trees allow us to define the ancestor and descendant relations among vertices. The four types of
edges are related to the spanning forest as follows:
The forward edges are edges.from a vertex to a descendant of it that are not tree edges. This
is because the test n u m ( w ) > num(v) indicates that w was explored after the call to DFS(v). Since the call to DFS(v) is not yet complete 20 must be a descendant of v.
The cross edges are edges from a vertex v to a vertex 20 such that the subtrees rooted at v
and w are disjoint. This follows because marlc(w) = 0 so the exploration of w is complete,
and was complete before the call to DFS(v). Therefore is not in a subtree rooted at w.
Vertex w is not in a subtree rooted at v because n u m ( w ) < nurn(v).
The back edges are edges from a vertex to an ancestor of it. The fact that marlc(w) = 1
indicates that the w is on the recursion stack and is thus an ancestor of v.
Below is an example of a graph and a corresponding depth first spanning forest.
We shall repeatedly make use of one very general property of the depth-first numbering of the graph. This is embodied in the following lemma.
Lemma 1 Let T, be the subtree of the spanning forest rooted at v, and define T, similarly. Suppose that T, and T, are disjoint. Then all the depth-first numbers in T, are greater than all of those in T, or all the depth-first numbers in T, are less than all of those in Tw. Furthermore, i f there is an edge from a vertex in T,' to a vertex in Tw then all of the depth first search numbers in T, are less
than all of them in T,.
DefinitionA vertex is called a base of a strong component i f it has the lowest depth-first search
number (num field) of any vertex in the strong component.
Lemma 2 Let b be the base of a strong component X. Then for all v E X , v is a descendant of b,
and all vertices on the path from b to v are in X.
Proof. First we prove the first part. Let v be any vertex besides b in X. We know that either (1) v
descends from b, or (2) b descends from v, or (3) neither of the above. (2) is impossible, because if
b descended from v, then its depth-first number would be greater than v's, which contradicts the
assumption that b is a base.
Suppose (3). There must be a path from b to v, because they are in the same strong component.
Consider one such path, p , and let r be the least common ancestor of all the vertices on the path.
(In other words, r^ is the vertex such that all the vertices on the path descend from it, but this
property does not hold for any of its descendants. Since there is only one tree in the spanning forest there must be such a vertex.)
We claim that r must be on the path p. To prove this seems to require two cases.
Case 1: b and v descend from different children of r.
Let Tb be the subtree containing b and let Tv be the subtree containing 21. Since num(b) < num(v) and Tb and Tv are disjoint, there cannot be any edge from a vertex in Tb to one in Tv. This follows from Lemma 1. Therefore the only way path p can get from b to v is by going through r.
Case 2: b and v descend from the same child of r.
Suppose that the path p does not contain r. Then the path must touch the subtrees of at least two distinct children of r. (If not, then a child of r would be a lower common ancestor than r of the path p.) Call the subtrees rooted at the children of r 2 ' 1 , T2,... ,Tk. Assume without loss of generality that TI contains b and v. Path p starts in TI goes through a sequence of subtrees of r , then returns to TI. Each time the path changes from one subtree to another, all of the numbers
in the new subtree must be less than all those of the old subtree, by Lemma 1. So it is impossible
for such a path to return to 2'1. The conclusion is that the path must go through r.
Ok, so the path goes through r , so what? Well, since r is an ancestor of b, its depth-first search
number is less than that of b. It is also in the same strong component as b because there is a
path from b to r and a path (along tree edges) from r to b. But b was supposed to be the lowest
numbered vertex in the strong component. This shows that (3)^ is impossible.
Of the three original alternatives, only (1) is left. Therefore v is a descendant of b.
The second part of the lemma is now trivial. Let x be a vertex on the path from v to b. There
is a path from b to x (via tree edges). There is also a path from x to b by first going from x to v
then, from v to b. Therefore x is in the same strong component as b. 0
Lemma 3 Let b be a base vertex. Let bl, ba,... , bk be all of the base vertices that descend from b.
Then b’s strong component is the set of vertices descending from b but not descending from any of
b l , b2, * * , b k.
Proof. Assume the contrary, i.e. that there is a vertex, v , in the same strong component as b and
which descends from b and bi. There must be a path from v to b. There also a path from b to bi to
v (following tree edges). Therefore b and bi are in the same strong component, which contradicts
the assumption that b and bi are bases of distinct strong components. 0
Definition: Let lowlink(v) be the minimum numbered vertex in the same strong component as
v that can be reached from v b y following in zero more tree edges followed b y at most one back or
cross edge.
Lemma 4 A vertex v is a base if and only if num(v) = ZowZink(v)
Proof. We first assume that num(v) > ZowZink(v) and prove that v is not a base. Be definition of
ZowZink, there is a vertex w in the same strong component of v such that ZowZink(v) = num(w).
Therefore num(v) > num(w), and v cannot be a base.
To prove the converse, assume that v is not a base. Let b be the base of the strong component
containing w. Then there must be a path p from v to b. By Lemma 2 b must be an ancestor of w.
Let y be the first vertex along the path p that is not in the subtree rooted at v , and let x be the
vertex before y on the path. Because the subtree rooted at y is disjoint from that rooted at w , and
there is an edge from a descendant of v (namely x ) to y , we can apply Lemma 1 and conclude that
num(y) < num(v). This shows that lowZink(v) 5 num(y), since y is in the same strong component
as v and is reachable by following tree edges, then one back or cross edge. Combining this with
the fact that num(y) < num(v) gives Zowlink(v) < num(v). 0
4. The Strong Components Algorithm
We can now present the algorithm for computing strong components. A stack S containing a
particular set of vertices is maintained by the algorithm.
Clearly ZowZink(v) is computed correctly assuming one thing: the test w E S is satisfied exactly
when w is in the same strong component as w. If w E S then by Lemma 5 there exists a path from
w to w. This, combined with edge ( v , w ) ensures that w and w are in the same strong component. If w tif S then since num(w) < num(v) the recursive call to w must have been completed and the strong component containing w has been output. This strong component has been correctly
computed by induction hypothesis (2).
We now show (2). By the proof just given, when the output phase is reached, ZowZink(v) has
been correctly computed. By Lemma 4 w is a base vertex.
The ensuing loop pops everything from the end of the stack (including w ) into a strong com-
ponent. These vertices are exactly those that are descendants of w but not in another strong
component (here we’re using induction hypothesis (2)). By Lemma 3, this is the strong component
with v as a base. 0
We would like to withdraw our assumptions that the graph has a vertex R connected to all
other vertices, and R is searched first by the algorithm. Compare the following two alternatives:
(1) Run the above algorithm on a graph G, or (2) add a new vertex R to G with an edge to all other
vertices, and run the search above by calling STRONG(&). By syntatic analysis of the program above, it is easy to see that the these two processes will differ only in that (2) will output one extra
strong component at the end (the one consisting of R). This is because while running (2), inside
the call to STRONG(R), the loop through the vertices adjacant vertices of R will behave exactly
like the loop on the outside running in (1).
It is tempting to consider what happens to the algorithm if the line:
“lowlink (w ) t min(Zowlink (w ) , num (w) )”
Is replaced by:
“ZowZink(v) t min(ZowZink(v), ZowZink(w))”
This new,ZowZink function is not the same as the old one, yet the algorithm will still work.
Lemma 6 The modified strong components algorithm works.
Proof. Let ZZ(w) be the new ZowZink function. (That is, change the line indicated above, then
replace ZowZink by ZZ everywhere in the program.) Since ZowZink(z) 5 num(z) this change can only decrease the function, that is ZZ(z) 5 ZowZink(z). Our goal is to show that the test ZowZink(v) =
num(w) is true in the running of the original algorithm exactly when U(v) = num(v) in the running
of the new algorithm.
One way is easy. If the test ZZ(v) = num(v) is satisfied, then it must be the case that in the
running of the original algorithm the test ZowZink(v) = num(v) is also satisfied, because lowZink(w)
is trapped between lZ(v) and num(v).
Conversely suppose that ZowZink(v) = num(w), that is, that w is a base vertex. Our goal
is to show that lZ(w) = jowZink(v). The reason is that even though U(z) can be smaller than
lowZink(z), its value,is still the depth first number (num field) of of another vertex in the same
strong component as z. If w is a base vertex there is no other vertex in the same strong component
as 21 with a smaller num field. Therefore in this case Zl(v) = ZowZink(v), which completes the proof.
0