An Adaptive and Iterative Algorithm for Refining Multiple Sequence Alignment | CSC 8910 | Papers Computer Science

Computational Biology and Chemistry 28 (2004) 141–148

An adaptive and iterative algorithm for

refining multiple sequence alignment

Yi Wang, Kuo-Bin Li∗

Bioinformatics Institute, 30 Biopolis Street, Singapore 138671, Singapore

Received 2 December 2003; received in revised form 10 February 2004; accepted 10 February 2004

Abstract

Multiple sequence alignment is a basic tool in computational genomics. The art of multiple sequence alignment is about placing gaps. This

paper presents a heuristic algorithm that improves multiple protein sequences alignment iteratively. A consistency-based objective function

is used to evaluate the candidate moves. During the iterative optimization, well-aligned regions can be detected and kept intact. Columns of

gaps will be inserted to assist the algorithm to escape from local optimal alignments. The algorithm has been evaluated using the BAliBASE

benchmark alignment database. Results show that the performance of the algorithm does not depend on initial or seed alignments much.

Given a perfect consistency library, the algorithm is able to produce alignments that are close to the global optimum. We demonstrate that

the algorithm is able to refine alignments produced by other software, including ClustalW, SAGA and T-COFFEE. The program is available

upon request.

Keywords: Iterative algorithm; Multiple sequence alignment; Alignment improver

1. Introduction

Multiple sequence alignment has for decades been an es-

sential tool for analyzing sequences of proteins and nucleic

acids. It is used in the detection of characteristic motifs and

conserved regions, in the determination of phylogenetic tree

as well as in the prediction of secondary and tertiary struc-

ture. To accommodate rapidly growing demands for auto-

mated alignment, various algorithms have been developed

to obtain sound alignment in the sense of both quality and

speed (Notredame, 2002; Thompson et al., 1999b). This evo-

lution started with the successful optimization of pairwise

alignment using dynamic programming (Needleman and

Wunsch, 1970; Smith and Waterman, 1981). When consid-

ering the alignment of more than two sequences, however,

researchers come to acknowledge that current computing

resource is more than often dwarfed by the complexity of

problem. Although theoretically convenient to be extended

to multiple alignment of Nsequences (Carrillo and Lipman,

1988), dynamic programming requires prohibitive memory

∗Corresponding author. Tel.: +65-6478-8265; fax: +65-6478-9047.

E-mail address: [email protected] (K.-B. Li).

space for an N-dimensional array as well as computational

resource of the order of the Nth power of the sequence length.

In order to achieve approximate alignments within fea-

sible time, two types of heuristics are generally used, i.e.,

progressive and iterative approaches. The progressive ap-

proach builds up multiple sequence alignment gradually by

aligning the closest pair of sequences first and successively

adding in the more distant ones. This family includes MUL-

TALIGN (Barton and Sternberg, 1987), MULTAL (Taylor,

1988) and ClustalW (Thompson et al., 1994), etc., which

differ mainly in the way to decide the order of adding se-

quences. The fundamental flaw with progressive approach

rests with its inability to adjust previous alignment with

newly added ones. Trivial misalignments in early stages

remain uncorrected and conserved, which afterwards accu-

mulate into serious ones, preventing newly added sequences

from being properly aligned.

On the other hand, iterative approaches (Gotoh, 1996;

Heringa, 1999, 2002; Notredame and Higgins, 1996) start

with an initial alignment including all the sequences and

then attempts to improve it at each iteration. Iterative al-

gorithm ends when specific number of iterations has been

practiced or no effective change could be identified any

more. An objective function is employed to evaluate align-

doi:10.1016/j.compbiolchem.2004.02.001

An Adaptive and Iterative Algorithm for Refining Multiple Sequence Alignment | CSC 8910, Papers of Computer Science

Related documents

Partial preview of the text

Download An Adaptive and Iterative Algorithm for Refining Multiple Sequence Alignment | CSC 8910 and more Papers Computer Science in PDF only on Docsity!