Gibbs Sampler: Identifying TF Binding Sites in DNA | Papers Computer Science

Gibbs Recursive Sampler: ﬁnding transcription

factor binding sites

William Thompson

*, Eric C. Rouchka

and Charles E. Lawrence

1,3

The Wadsworth Center, New York State Department of Health, Albany, NY 12201-0509, USA,

Department of Computer Engineering and Computer Science, University of Louisville, Louisville, KY 40292,

USA and

Computer Science Department, Rensselaer Polytechnic Institute, Troy, NY 12180, USA

Received February 14, 2003; Revised and Accepted April 9, 2003

ABSTRACT

The Gibbs Motif Sampler is a software package for

locating common elements in collections of biopoly-

mer sequences. In this paper we describe a new

variation of the Gibbs Motif Sampler, the Gibbs

Recursive Sampler, which has been developed

speciﬁcally for locating multiple transcription factor

binding sites for multiple transcription factors

simultaneously in unaligned DNA sequences that

may be heterogeneous in DNA composition. Here we

describe the basic operation of the web-based

version of this sampler. The sampler may be acces-

sed at http://bayesweb.wadsworth.org/gibbs/gibbs.

html and at http://www.bioinfo.rpi.edu/applications/

bayesian/gibbs/gibbs.html. An online user guide is

available at http://bayesweb.wadsworth.org/gibbs/

bernoulli.html and at

http://www.bioinfo.rpi.edu/

applications/bayesian/gibbs/manual/bernoulli.html

Solaris, Solaris.x86 and Linux versions of the

sampler are available as stand-alone programs for

academic and not-for-proﬁt users. Commercial

licenses are also available. The Gibbs Recursive

Sampler is distributed in accordance with the ISCB

level 0 guidelines and a requirement for citation of

use in scientiﬁc publications.

INTRODUCTION

Transcription regulation is arguably the most important

foundation of cellular function, since it exerts the most

fundamental control over the abundance of virtually all of a

cell’s functional macromolecules. A predominant feature of

transcription regulation is the binding of regulatory proteins,

transcription factors (TFs), to cognate DNA binding sites

known as transcription factor binding sites (TFBS) in the

genome. The computational identification of TFBS through

the analysis of DNA sequence data has emerged in the last

decade as a major new technology for the elucidation of

transcription regulatory networks. The Gibbs Motif Sampler

is a software package used to locate common elements in

collections of biopolymer sequences. It has been applied to

the analysis of protein sequences (1,2). Gibbs sampling has

also been used extensively in the identification of TFBS

(3,4) and an earlier version of this software has been

available at this web location for some time. In this paper we

describe a new variation, the Gibbs Recursive Sampler,

designed to search for multiple TFBS simultaneously. It

includes several features that are designed specifically for

locating TFBS in unaligned DNA sequences. These features

are based on characteristics of TF/DNA complexes or their

components.

THE GIBBS RECURSIVE SAMPLER

Gibbs sampling is a Markov Chain Monte Carlo procedure that

has seen wide application in the statistical community. It was

first applied in bioinformatics as a tool for multiple sequence

alignment in 1993 (1). Gibbs sampling techniques have

subsequently seen numerous enhancements and applications

(2,5). A key feature of sequence-based Gibbs sampling algo-

rithms and related expectation maximization algorithms (6,7)

is the use of motif models in the form of product multinomial

models to capture sequence patterns common to the binding

sites of each TF.

The recursive sampler, described here, was specifically

developed for the identification of TFBS in unaligned DNA

sequences. It includes several features that are unique to this

software: a rigorous Bayesian method for inferring the number

and the locations of the TFBS for multiple TF motifs

simultaneously; a background model of the heterogeneity in

the composition of non-coding nucleotide sequence and the

ability to use prior information of binding motifs. In addition,

it includes features to allow the use of palindromic, direct

repeat and concentrated alphabet models, preferred binding

site locations, and a rigorous test of the statistical significance

of the results, the Wilcoxon signed-rank test.

In the following, we briefly describe how the algorithm

incorporates these features and we provide instructions on

the use of the algorithm and on the interpretation of its

results.

*To whom correspondence should be addressed. Tel: þ1 5184867882; Fax: þ1 518 473 2900; Email: [email protected]

3580–3585 Nucleic Acids Research, 2003, Vol. 31, No. 13

DOI: 10.1093/nar/gkg608

Gibbs Sampler: Identifying TF Binding Sites in DNA, Papers of Computer Science

Related documents

Partial preview of the text

Download Gibbs Sampler: Identifying TF Binding Sites in DNA and more Papers Computer Science in PDF only on Docsity!

Gibbs Recursive Sampler: finding transcription

factor binding sites

William Thompson 1,*, Eric C. Rouchka 2 and Charles E. Lawrence 1,

ABSTRACT

INTRODUCTION

THE GIBBS RECURSIVE SAMPLER

RECURSIVE DISCOVERY OF SITES

AND SITE COUNTS

HETEROGENEOUS BACKGROUND

PALINDROMES AND DIRECT REPEATS

A FREQUENCY-BASED SOLUTION

PROGRAM OUTPUT

INTERPRETATION OF RESULTS