Probabilistic Models in Biology: Understanding Probabilities and Markov Chains | Study notes Mathematics

BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B

Spring 2002, Jan. 14 and 16 lectures

Probabilities and probabilistic models

Reading: S. M. Ross, “Introduction to probability models”, 7th ed. Chapter 1.

Reference: R. Durbin, S. Eddy, A. Krogh, and G. Mitchison, 1998 “Biological sequence

analysis: Probabilistic models of proteins and nucleic acids”, Section 1.3 and 3.1.

Probabilities

Let us consider a very simple example. A familiar probabilistic system with a set of

discrete outcomes is the roll of a six-sided die. To define probabilities, we must first

define the space of possible events. In this example, there are six events for a roll of a die,

face 1-6. Following the annotations of the reading (Ross), }6,5,4,3,2,1{

61...pp

. A model of a

roll of a die (possible loaded) would have six parameters ; the probability of

rolling i is . To be probabilities, the parameters must satisfy the conditions that

and . For example, suppose that all six numbers are equally likely to

appear (a fair die), then we will have

∑

≥

=ii

654321 ====== pppppp

Another example closer to our biological subject matter is to define probabilities

of amino acids or nucleotides. For instance, at a position of a protein sequence, we

assume an amino acids a occurs at random with probability q. In another words, the

amino acid at this position is possible to be one of the twenty types, each type a has

probability to occur. The probability is a number between 0 and 1, and the sum of

all twenty probabilities equals to 1.

Conditional probabilities and independency

Suppose that we toss two dice. There are totally 36 possible outcomes, combing the

possible numbers of the first and second toss. Suppose that each of the 36 possible

outcomes is equally likely to occur hence has probability 1/36. If we observe that the first

die is a four, then given this information, what is the probability that the sum of the two

dice equals six? Given that the initial die is a four, it follows that there can be at most six

possible outcomes of our experiment, namely, (4,1), (4,2), (4,3), (4,4), (4,5), and (4,6).

Since each of these outcomes originally had the same probability of occurring, they

should still have equal probabilities. That is, given that the first die is a four, then the

(conditional) probability of each of the outcomes (4,1), (4,2), (4,3), (4,4), (4,5), and (4,6)

is 1/6, while the (conditional) probability of the other 30 points is 0. Hence, the desired

probability will be 1/6.

A conditional probability is the probability that one event will occur given that we

already know that some other events have occurred. If we let E and F denote respectively

Probabilistic Models in Biology: Understanding Probabilities and Markov Chains, Study notes of Mathematics

Related documents

Partial preview of the text

Download Probabilistic Models in Biology: Understanding Probabilities and Markov Chains and more Study notes Mathematics in PDF only on Docsity!

BIOL 495S/ CS 490B/ MATH 490B/ STAT 490B

Probabilities and probabilistic models

PF

P EF

P E F = (1)

P ( EF )

P ( EF )

PE F

PF

PEF P

+ A C G T -^ A C G T

A 0.180 0.274 0.426 0.120 A 0.300 0.205 0.285 0.

C 0.171 0.368 0.274 0.188 C 0.322 0.298 0.078 0.

G 0.161 0.339 0.375 0.125 G 0.248 0.246 0.298 0.

T 0.079 0.355 0.384 0.182 T 0.177 0.239 0.292 0.

−

P ( EF )= P ( F | E ) P ( E )

2 × 104

6 × 103

× 103 8 ×

PF E PE

PF

P F E PE

P E F