







Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The constrained multiple sequence alignment (cmsa) problem, which involves aligning given sequences under the guidance of a given pattern (constraint). The pcmsa algorithm, a parallel cmsa algorithm presented by chin et al., is summarized. The algorithm is based on finding a shortest path in the dynamic programming matrix, where each layer is an n-dimensional matrix. The pcmsa algorithm uses a bidirectional-method-based a* algorithm to compute all pairs shortest paths among the entry vertices and exit vertices on each layer in parallel. The document also discusses a multithreaded parallel approach for the dynamic programming algorithm on pairwise sequence alignment problem, which can be used to parallelize the bidirectional-method-based a* search.
Typology: Papers
1 / 13
This page cannot be seen from the preview
Don't miss anything!








Dan He and Abdullah N. Arslan
Department of Computer Science University of Vermont Burlington, VT 05405, USA {dhe, aarslan}@cs.uvm.edu Abstract. The constrained multiple sequence alignment problem (CM SA) is to align given sequences S 1 , S 2 , ..., Sn such that similar subsequences are aligned in the same region under the guidance of a given pattern (con- straint) P. The CM SA problem can be considered as a constrained path search problem in the dynamic programming matrix. The problem has a dynamic programming solution that requires O(2n|S 1 ||S 2 |...|Sn||P |) time where we denote by |X| the length of string X for any X. There is a parallel algorithm that uses |P | + 1 processors. The experimental evidence suggests that this algorithm takes O(2n|S 1 ||S 2 |...|Sn|) time. In this paper we propose a more general parallel algorithm which further improves the time requirement of the problem in practical applications.
Keywords: constrained sequence alignment, multiple alignment, dy- namic programming, parallel algorithm.
The constrained multiple sequence alignment (CM SA) problem has been intro- duced by Tang et al. [12]. The problem aims to incorporate the biologically meaningful prior knowledge of the structure or pattern of the input sequences into the alignment process. The problem is to find an optimal multiple alignment of given n strings S 1 , S 2 , ..., Sn such that the alignment contains a given pattern string P , i.e. in the alignment matrix there exists a sequence c of columns each entirely composed of symbol P [k] for every k where P [k] is the kth symbol in P , 1 ≤ k ≤ |P |, and in the sequence c, a column containing P [i] appears before column containing P [j] for all i, j, i < j. An application of the problem is the alignment of RNase sequences. Such sequences are all known to contain three active residues His(H), Lyn(K), His(H) that are essential for RN A degrading. Therefore, it is natural to expect that in an alignment of RN A sequences, each of these residues should be aligned in the same column. The CM SA problem when k = 2 is called the constrained pairwise sequence alignment (CP SA) problem. There are many dynamic programming algorithms for the CM SA and CP SA problems, and their variations [12, 3, 13, 14, 1, 4, 7, 8]. In this paper we propose a more general parallel algorithm that further par- allelizes the parallel CM SA algorithm P CM SA of He and Arslan [8]. Experi- mental evidence shows that our algorithm improves the results obtained by the P CM SA algorithm.
The outline of this paper is as follows: In Section 2 summarize the parallel CM SA algorithm P CM SA of He and Arslan [8]. We present our parallel algo- rithm F astP CM SA for the problem in section 2.1 and show the results of our experiments in Section 3. We include our final remarks in Section 4.
For the CM SA problem Chin et. al [3] presents a dynamic programming for- mulation (see Appendix 5.1). The CM SA problem can also be considered as a problem of finding a shortest path in the dynamic programming matrix which we can visualize in layers indexed by its last dimension (positions in the pat- tern string) (see Figure 6 in Appendix 5.1) where each layer is an n-dimensional matrix. He and Arslan [8] present the following parallel CM SA algorithm, P CM SA:
In Step 1 of Algorithm P CM SA we use part of the CM SA algorithm of He and Arslan [7] to find rectangular boundary necessary to consider for each layer k in the dynamic programming matrix (see Figure 7 in Appendix 5.2). In Step 2 of P CM SA, for each layer except for the first and the last, we use the bidirectional-method-based A∗^ algorithm (see Appendix 5.3) to compute all pairs shortest paths among the entry vertices and exit vertices, on each layer in parallel. In Step 3 of P CM SA, after we compute all possible shortest paths on each layer, we find the global shortest path between vertices (0, 0 ,... , 0 , 0) and (s 1 , s 2 ,... , sn, r) by selecting one shortest path connecting an entry and an exit vertex on each layer such that the sum of the shortest paths from all layers is minimized. The global shortest path is the combination of these shortest paths on each layer. In this final step of the algorithm we can use a single-source shortest paths algorithm. He and Arslan [8] present experimental evidence suggesting that P CM SA takes time O(2ns 1 s 2... sn) in practice indicating a factor of O(r) improvement over a naive sequential CM SA algorithm implementing the dynamic program- ming solution of Chin et. al [3].
2.1 FastPCMSA Algorithm
In the P CM SA algorithm of He and Arslan [8], although we can do the compu- tations on different layers independently and in parallel, the computation time on each layer is still high (O(2ns 1 s 2... sn)). In the P CM SA algorithm the num- ber of processors we can use is dependent on the number of layers, namely the lengths of the pattern string, and this can only eliminate the factor r in the total time complexity O(2ns 1 s 2... snr). If we can parallelize the computation at each layer, the speed-up will not be limited by the the length of the pattern, r.
programming matrix. Then we can parallelize the bidirectional-method-based A∗^ search by assigning these entry vertices to different processors and let each processor compute the shortest paths from each entry vertex assigned to it to all the exit vertices in their corresponding exit vertex set. In order to balance the computation time for each processor, we try to assign the entry vertex with large exit vertex set, under which case it is often time consuming to compute the shortest paths, to different processors. Since the computation time for the bidirectional-method-based A∗^ search mainly depends on those entry vertices with large exit vertex set, and if there are many such entry vertices, the search can be quite time consuming. By parallelizing the computations for these vertices we can achieve a significant speed-up. As we show in the algorithm in Figure 2 we sort the size of exit vertex-sets into descending order (we precompute these sets) and do the processor assignments in a round-robin manner. This is how we solve the load-balancing problem here but we note that this is an interesting op- timization problem on its own. If we use a shared-memory parallel machine, the computations on each processor are similar to the computations on a sequential processor. The details of this parallelization are shown in Figure 3.
Parallelization of the backward dynamic programming algorithm Let P 1 < P 2 <... < PN be the sequence of N processors ordered by their processor numbers available for computations. Let each processor has a cache of size CACHE If CACHE>size(dynamic programming matrix) { assign the whole dynamic programming matrix to only processor P 1 }else{ Divide the dynamic programming matrix into blocks such that size(block)=CACHE; Assign the start block (i.e the block with lowest indices in all dimensions) to P 1 ; Initialize finished block set F = N U LL; After P 1 finishes computation, send start block to main memory; add start block to F , While not finished all blocks { Find the adjacent block set A such that each block in A is adjacent to at least one block in F ; For blocki ∈ A { If all neighbor blocks of blocki are finished { Assign blocki to Pj , where Pj is the smallest numbered free processor; After Pj finishes computation, remove blocki from A, send blocki to the main memory, add blocki to F. } } }
Fig. 2. Parallelization of the backward dynamic programming algorithm
We have implemented and run our F astP CM SA algorithm on a sequential Intel Xeon 2.4GHz machine with 2GB memory. By running our algorithm on this sequential machine we aim to collect experimental evidence to help us estimate the performance of our algorithm F astP CM SA on an SGI Origin 2400 parallel computer which uses shared-memory architecture.
Parallelization for the bidirectional-method-based A∗^ algorithm Let P 1 < P 2 <... < PN be the sequence of N processors ordered by their processor numbers available for computations. For all layers k, 1 ≤ k ≤ r, find all candidate entry vertices eni = (t 1 , t 2 , ..., tn, k) such that S 1 [t 1 ] = S 2 [t 2 ] = ... = Sn[tn] = P [k] in the overlapping region of layer k − 1 and k. For all layers k, 0 ≤ k ≤ r − 1 , find all candidate exit vertices exi = (m 1 , m 2 , ..., mn, k) such that S 1 [m 1 ] = S 2 [m 2 ] = ... = Sn[mn] = P [k + 1] in the overlapping region of layer k and k + 1. For each eni { For each exj { If(test(eni, exj )==true){ Add exj to Ei, which is the exit vertex set for eni; } } } sort Ei in descending order of their sizes, let the resulting ordering be E 1 ′ > E 2 ′ >, ..., > E′ h. For i = 1 to h do { Assign E′ i and eni to Pj , where Pj is the smallest numbered free processor, wait if there is no free processor; } boolean function test((x 1 , x 2 ,... , xn), (y 1 , y 2 ,... , yn)) { if for all i, yi ≥ xi then return TRUE else return FALSE }
Fig. 3. Parallelization of the bidirectional-method-based A∗^ algorithm
In our estimates we ignore the communication cost which is insignificant when we run our algorithm on a shared-memory architecture. We consider the longest of the steps executed in parallel whenever there is parallelism, and find a total execution time for our algorithm by adding the executions times of these steps, which is pessimistic. We strongly believe that if our algorithm is run on a shared-memory parallel architecture the inter-processor communication costs will be insignificant because each processor will access a separate block in mem- ory, and they do not communicate with each other directly. Each processor should be informed about the termination of neighboring processors, and if all
FCMSA PCMSA FastPCMSA
Fig. 5. The execution times of algorithms F CM SA, P CM SA and F astP CM SA for the Data set 1 in Figure 7 and Data set 2 in Figure 8 of [3]. We considered the first 4 sequences and took pattern string as HKST H
for the shortest paths among those 9 entry vertices and their corresponding exit vertex sets with size 54 in the P CM SA algorithm takes nearly 90% of the total computation time. If we use 9 processors to parallelize the bidirectional-method- based A∗^ search on layer 4, the execution time can be reduced to only one third of the execution time of P CM SA. Therefore, if we use 9 processors to parallelize the computations on layer 4, the total execution time for this layer is only about 20.885 seconds, which is about one third of the total execution time of P CM SA on the same layer. Since the computation time for layer 4 is the longest among all 6 layers, the complete execution time for Step 2 is 20.885 seconds. The total execution time is the combination of the execution times for all three steps. The comparison of the time requirement of the algorithms F CM SA, P CM SA, and F astP CM SA are shown in Figure 5. Although we did not consider the communication cost among processors, we expect this cost is insignificant on a shared-memory par- allel machine compared with the computation cost on the same machine.
We propose a new parallel algorithm for the constrained multiple sequence align- ment problem. Our algorithm further parallelizes the main step of the parallel algorithm proposed by He and Arslan [8]. We present experimental evidence on real data suggesting that our algorithm improves the time requirement of solving the constrained multiple sequence alignment in practice.
Fig. 6. For the CP SA (CM SA with n = 2) with pattern string of length 3, a global shortest path passing through entry and exit vertices, and connecting sub-paths on each layer.
layer 0 and ending at layer r, where r is the length of the pattern string. We call an optimal solution of the CM SA problem as a global shortest path. A global shortest path enters each layer at a vertex (we call it an entry vertex), after traversing a number of vertices in each layer exits the layer at a vertex (we call it an exit vertex) never to come back to this layer again. An exit vertex of layer k is also the entry vertex of layer k + 1 for 0 ≤ k < r. The length of the global shortest path is the sum of the length of the sub-paths on each layer, and each sub-path on layer k in the global shortest path is the shortest path between the entry and exit vertices on layer k.
5.2 Algorithm F astCM SA
Parts of Algorithm F astCM SA of He and Arslan [7] computes a boundary at each layer that a global optimal path for the CM SA problem passes as shown in Figure 7.
5.3 Bidirectional-Method-Based A∗^ Algorithm
The A∗^ algorithm [6] is a very popular heuristic search algorithm which is the ex- tension of Dijkstra’s single source shortest path algorithm [5]. It uses a heuristic estimator for the distance from each vertex in the graph to the destination. The score for each vertex is the sum of the heuristic value and the actual distance from the source to the vertex. The algorithm always expands the vertex with the minimum score. In most practical cases, the A∗^ algorithm is very efficient. The bidirectional algorithm [2] applies the Dijkstra algorithm simultaneously from both the source s and the destination e. The search of the Dijkstra al- gorithm terminates if the forward and backward explorations meet. Then the shortest distance is obtained by picking a point f in the forward exploration
Steps from Algorithm F astCM SA [7]
for t = 1 to n do for k = 0 to r − 1 do set Sf irst[t][k] = the first position f in St such that P [1..(k + 1)] is a subsequence of St[1..f ] set Slast[t][k] = the last position l in St such that P [(k + 1)..r] is a subsequence of St[l..st]
for k=0 to r do if(k == 0){ S 1 begin[0] = 0; S 2 begin[0] = 0; ......
Snbegin[0] = 0; } else { S 1 begin[k] = Sf irst[1][k − 1] + 1; S 2 begin[k] = Sf irst[2][k − 1] + 1; ......
Snbegin[k] = Sf irst[n][k − 1] + 1; } if (k == r){ S 1 last[k] = r; S 2 last[k] = r; ......
Snlast[k] = r; }else{ S 1 last[k] = Slast[1][k] + 1; S 2 last[k] = Slast[2][k] + 1; ......
Snlast[k] = Slast[n][k] + 1; }
Fig. 7. Steps of Algorithm F astCM SA of He and Arslan [7] that compute boundary at each layer.
algorithm for each entry-exit vertex-pair that uses similar search strategy as the algorithm by Ikeda and Imai [9]. However, they apply a different heuristic func- tion, which is used in the bidirectional-method-based A∗^ algorithm of Shibuya [11].