Mods Statistics, Lecture Notes - Mathematics - Prof Peter Donnelly 2, Study notes of Mathematical Methods for Numerical Analysis and Optimization

Integrating Facto,r Notations, Random samples, Normal Distribution ,Maximum likelihood estimation, Parameter estimation, accuracy of the estimate, polynomial regression

Typology: Study notes

2010/2011

Uploaded on 09/09/2011

andreasphd
andreasphd 🇬🇧

4.7

(28)

287 documents

1 / 77

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MODS STATISTICS
HT 2011
Peter Donnelly,
Department of Statistics and The
Wellcome Trust Centre for Human Genetics
Lecture notes and problem sheets will be
available from the Mathematical Institute’s
website.
1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d

Partial preview of the text

Download Mods Statistics, Lecture Notes - Mathematics - Prof Peter Donnelly 2 and more Study notes Mathematical Methods for Numerical Analysis and Optimization in PDF only on Docsity!

MODS STATISTICS

HT 2011

Peter Donnelly,

Department of Statistics and The Wellcome Trust Centre for Human Genetics

[email protected]

Lecture notes and problem sheets will be available from the Mathematical Institute’s website.

Introduction.

We will be concerned with the mathemat- ical framework for making inferences from data. The tools of probability provide the backdrop, allowing us to quantify the uncer- tainties involved.

Examples

  1. Question: How tall is the average five year old girl?

Data: x 1 , x 2 ,... , xn, the heights of n ran- domly chosen girls.

An obvious estimate is

¯x =

n

∑^ n i=

xi

How precise is our estimate?

Notation

We usually denote observations by lower case letters: x 1 , x 2 ,... , xn.

Regard these as observed values of random variables (rv’s) (for which we usually use up- per case) X 1 , X 2 ,... , Xn.

We often write x (respectively X) for the col- lection x 1 , x 2 ,... , xn (respectively X 1 , X 2 ,... , Xn).

In different settings, it is convenient to think of xi as the observed value of Xi, or as a possible value that Xi can take.

For example, if Xi is a Poisson random vari- able with mean λ,

P(Xi = xi) = e

−λλxi xi!

for xi = 0, 1 , 2 ,.. ..

1. Random Samples.

Definition 1 A random sample of size n is a set of random variables X 1 , X 2 ,... , Xn which are independent and identically distributed (i.i.d.).

Examples

  1. Let X 1 , X 2 ,... , Xn be a random sam- ple from a Poisson distribution with mean λ. (e.g. Xi = # of accidents on Parks Road in year i.) Then,

f (x) = P(X 1 = x 1 , X 2 = x 2 ,... , Xn = xn)

= P(X 1 = x 1 )P(X 2 = x 2 ) · · · P(Xn = xn)

e−λλx^1 x 1!

e−λλx^2 x 2!

e−λλxn xn!

= e−nλ^ λ(

∑n i=1 xi) ∏n i=1 xi!^

where the second equality follows from the independence of the Xi.

In probability questions we would usually as- sume that the parameters λ and μ from our previous examples are known.

In many settings they will not be known, and we wish to estimate them from data. Two key questions of interest are:

  1. What is the best way to estimate them? (And what does “best” mean here?)
  2. For a given method of estimation, how precise is a particular estimator?

2. Summary Statistics.

Definition 2 Let X 1 , X 2 ,... , Xn be a random sample. The sample mean is defined as

X¯ =^1 n

∑^ n i=

Xi.

The sample variance is defined as

S^2 = 1 n − 1

∑^ n i=

(Xi − X¯)^2.

The sample standard deviation is S (=

S^2 ).

Notes

  1. The denominator in the definition of S^2 is n − 1, not n.
  2. X¯ and S^2 are random variables, so they have distributions (called the sampling dis- tributions of X¯ and S^2 .)

The Normal Distribution.

Definition 3 Recall that X has a normal dis- tribution with mean μ and variance σ^2 , writ- ten X ∼ N(μ, σ^2 ), if the p.d.f. of X is

f (x) =

√^1

2 πσ^2

e−

(^12) (x−σ μ) 2

for −∞ < x < ∞.

Recall also that E(X) = μ and var(X) = σ^2.

If μ = 0 and σ = 1, then X is said to have a standard normal distribution, and we write X ∼ N(0, 1).

Important Result

If X ∼ N(μ, σ^2 ) and Z = (X − μ)/σ, then Z ∼ N(0, 1).

The cumulative distribution function (c.d.f.) of a standard normal random variable is:

Φ(x) =

∫ (^) x −∞

√^1

2 π

e−u (^2) / 2 du.

Example 1. continued

Suppose n = 62 and x 1 , x 2 ,... , xn are 62 time intervals between major earthquakes. As- sume X 1 , X 2 ,... , Xn are exponential random variables with mean μ.

How does one estimate the unknown μ? In- tuition suggests using μ = ¯x. But is this a good idea? Are there general principles we can use to choose estimators?

In general, suppose X 1 , X 2 ,... , Xn is a ran- dom sample from a distribution with p.d.f. (or p.m.f.) f (x; θ). If we regard the param- eter θ as unknown, we need to estimate it using x 1 , x 2 ,... , xn.

Definition 4 Given observations x 1 , x 2 ,... , xn and unknown parameter θ, the likelihood of θ is the function

L(θ) = f (x; θ)

∏^ n i=

f (xi; θ). (1)

That is, L is the joint density (or mass) func- tion, but regarded as a function of θ, for a fixed x 1 , x 2 ,... , xn. The likelihood L(θ) is the probability (or probability density) of observ- ing x = x 1 , x 2 ,... , xn if the unknown param- eter is θ.

The log-likelihood is l(θ) = log L(θ) (The logarithm is to the base e).

The maximum likelihood estimate θˆ(x), is the value of θ that maximizes L(θ).

θˆ(X) is the maximum likelihood estimator (m.l.e.).

Example 1 again

In this case the parameter of interest is μ.

L(μ) =

∏^ n i=

μ e−

xi μ

μn^

e(−

(^1) μ^ ∑ni=1 xi) ,

and so

l(μ) = −n log μ −

∑n i=1 xi μ

Then dl dμ

= −n μ

∑n i=1 xi μ^2

and dl dμ = 0 ⇒ μ = ¯x ,

(which is a maximum).

Therefore, the maximum likelihood estimate of μ is ¯x.

The maximum likelihood estimator is X¯.

Example

Consider a random variable X with a Bernoulli distribution with parameter p (this is the same as a Binomial(1, p)).

P(X = 1) = p ,

P(X = 0) = 1 − p.

The probability mass function of X is

f (x; p) = P(X = x)

{ px(1 − p)^1 −x^ x = 0, 1. 0 otherwise.

Suppose X 1 , X 2 ,... , Xn is a random sample. Then, the likelihood is

L(p) =

∏^ n i=

pxi(1 − p)^1 −xi

= pr(1 − p)n−r^ ,

where r = ∑ni=1 xi.

Example

Suppose we take a random sample of indi- viduals from a population, and test their ge- netic type at a particular chromosomal loca- tion (called a “locus” in genetics). At this particular position, each chromosome in the population will have one of two possible vari- ants, which we denote by A and a. Since each individual has two chromosomes (we re- ceive one from each of our parents), then the type of a particular individual could be one of three so-called genotypes, AA, Aa, or aa, depending on whether they have 2, 1, or 0 copies of the A variant. (Note that order is not relevant, so there is no distinction be- tween Aa and aA.)

There is a simple result, called the Hardy- Weinberg law, which states that under plau- sible assumptions, the genotypes AA, Aa and aa will occur with probabilities p 1 = θ^2 , p 2 = 2 θ(1 − θ) and p 3 = (1 − θ)^2 respectively, for some 0 ≤ θ ≤ 1.

Now suppose the random sample of n indi- viduals contains:

x 1 of type AA; x 2 of type Aa; x 3 of type aa;

where ∑ 3 i=1 xi^ =^ n.

Then the likelihood L(θ) is the probability that we observe (x 1 , x 2 , x 3 ) if we assign indi- viduals to genotypes with probabilities (p 1 , p 2 , p 3 ). That is,

L(θ) = n! x 1 !x 2 !x 3!

px 11 px 22 px 33.

This is a multinomial distribution (the gen- eralization of the binomial distribution in the setting when there are more than two possi- ble outcomes).