Parallel Computation of Constrained Multiple Sequence Alignment Problem | Papers Computer Science

FastPCMSA: An Improved Parallel Algorithm

for the Constrained Multiple Sequence

Alignment Problem

Dan He and Abdullah N. Arslan

Department of Computer Science

University of Vermont

Burlington, VT 05405, USA

{dhe, aarslan}@cs.uvm.edu

Abstract. The constrained multiple sequence alignment problem (C MS A)

is to align given sequences S1, S2, ..., Snsuch that similar subsequences

are aligned in the same region under the guidance of a given pattern (con-

straint) P. The CM SA problem can be considered as a constrained path

search problem in the dynamic programming matrix. The problem has

a dynamic programming solution that requires O(2n|S1||S2|...|Sn||P|)

time where we denote by |X|the length of string Xfor any X. There

is a parallel algorithm that uses |P|+ 1 processors. The experimental

evidence suggests that this algorithm takes O(2n|S1||S2|...|Sn|) time. In

this paper we propose a more general parallel algorithm which further

improves the time requirement of the problem in practical applications.

Keywords: constrained sequence alignment, multiple alignment, dy-

namic programming, parallel algorithm.

1 Introduction

The constrained multiple sequence alignment (CM SA) problem has been intro-

duced by Tang et al. [12]. The problem aims to incorporate the biologically

meaningful prior knowledge of the structure or pattern of the input sequences

into the alignment process. The problem is to find an optimal multiple alignment

of given nstrings S1, S2, ..., Snsuch that the alignment contains a given pattern

string P, i.e. in the alignment matrix there exists a sequence cof columns each

entirely composed of symbol P[k] for every kwhere P[k] is the kth symbol in

P, 1 ≤k≤ |P|, and in the sequence c, a column containing P[i] appears before

column containing P[j] for all i, j, i < j. An application of the problem is the

alignment of RNase sequences. Such sequences are all known to contain three

active residues His(H), Lyn(K), His(H) that are essential for RNA degrading.

Therefore, it is natural to expect that in an alignment of RNA sequences, each of

these residues should be aligned in the same column. The CM SA problem when

k= 2 is called the constrained pairwise sequence alignment (CP SA) problem.

There are many dynamic programming algorithms for the CM SA and CP S A

problems, and their variations [12, 3, 13, 14, 1, 4, 7, 8].

In this paper we propose a more general parallel algorithm that further par-

allelizes the parallel CM SA algorithm P C M SA of He and Arslan [8]. Experi-

mental evidence shows that our algorithm improves the results obtained by the

P CM SA algorithm.

Parallel Computation of Constrained Multiple Sequence Alignment Problem, Papers of Computer Science

Related documents

Partial preview of the text

Download Parallel Computation of Constrained Multiple Sequence Alignment Problem and more Papers Computer Science in PDF only on Docsity!

FastPCMSA: An Improved Parallel Algorithm

for the Constrained Multiple Sequence

Alignment Problem

1 Introduction

2 Parallel Computation of CM SA

3 Experiments

4 Conclusion

References