Bayesian Statistics: Lecture 1 on Prior, Posterior, and Bayes Formula - Prof. Jun Shao, Study notes of Mathematical Statistics

The lecture notes for stat 710: mathematical statistics at the university of wisconsin-madison, covering the topics of bayesian method, prior and posterior distributions, and bayes formula. It explains how to construct the posterior distribution using the joint distribution of x and ⃗θ, and provides the bayes formula for continuous and discrete cases.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-kwb
koofers-user-kwb 🇺🇸

10 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
logo
Stat 710: Mathematical Statistics
Lecture 1
Jun Shao
Department of Statistics
University of Wisconsin
Madison, WI 53706, USA
Jun Shao (UW-Madison) Stat 710, Lecture 1 Jan 21, 2009 1 / 7
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Bayesian Statistics: Lecture 1 on Prior, Posterior, and Bayes Formula - Prof. Jun Shao and more Study notes Mathematical Statistics in PDF only on Docsity!

logo

Stat 710: Mathematical Statistics

Lecture 1

Jun Shao

Department of Statistics University of Wisconsin Madison, WI 53706, USA

logo

Chapter 4: Estimation in Parametric Models

Lecture 1: Prior, posterior, and Bayes formula

X is from a population in a parametric family P = { P θ : θ ∈ Θ}, where Θ ⊂ R k^ for a fixed integer k ≥ 1

Three topics

Bayesian method Minimaxity and admissibility Likelihood approach

Bayes rules in §2.3.

Decision rules minimizing the average risk w.r.t. a given probability measure Π on Θ Optimal rules in the Bayesian approach , which is fundamentally different from the classical frequentist approach that we have been adopting

logo

Chapter 4: Estimation in Parametric Models

Lecture 1: Prior, posterior, and Bayes formula

X is from a population in a parametric family P = { P θ : θ ∈ Θ}, where Θ ⊂ R k^ for a fixed integer k ≥ 1

Three topics

Bayesian method Minimaxity and admissibility Likelihood approach

Bayes rules in §2.3.

Decision rules minimizing the average risk w.r.t. a given probability measure Π on Θ Optimal rules in the Bayesian approach , which is fundamentally different from the classical frequentist approach that we have been adopting

logo

Bayesian approach

θ is viewed as a realization of a random vector ~ θ ∈ Θ whose prior distribution is Π Prior distribution: past experience, past data, or a statistician’s belief (subjective) Sample X ∈ X : from P θ = Px | θ , the conditional distribution of X given ~ θ = θ Posterior distribution: updated prior distribution using the sample X = x

How to construct the posterior?

By Theorem 1.7, the joint distribution of X and ~ θ is a probability measure on X × Θ determined by

P ( A × B ) =

B

Px | θ ( A ) d Π( θ ), A ∈ BX , B ∈ BΘ

The posterior distribution is the conditional distribution P θ | x whose existence is guaranteed by Theorem 1.7 a.s. x ∈ X

logo

When Px | θ has a p.d.f., Theorem 4.1 provides a formula for the p.d.f. of the posterior distribution

Theorem 4.1 (Bayes formula)

Assume P = { Px | θ : θ ∈ Θ} is dominated by a σ -finite measure ν and f θ ( x ) = dPx | θ / d ν is a Borel function on (X × Θ, σ (BX × BΘ)). Let Π be a prior distribution on Θ. Suppose that m ( x ) =

∫ Θ f^ θ^ ( x ) d Π^ >^ 0. (i) The posterior distribution P θ | x ≪ Π and

dP θ | x / d Π = f θ ( x )/ m ( x )

(ii) If Π ≪ λ and d Π/ d λ = π( θ ) for a σ -finite measure λ , then

dP θ | x / d λ = f θ ( x ) π( θ )/ m ( x )

Proof:

Result (ii) follows from result (i) and Proposition 1.7(iii)

logo

When Px | θ has a p.d.f., Theorem 4.1 provides a formula for the p.d.f. of the posterior distribution

Theorem 4.1 (Bayes formula)

Assume P = { Px | θ : θ ∈ Θ} is dominated by a σ -finite measure ν and f θ ( x ) = dPx | θ / d ν is a Borel function on (X × Θ, σ (BX × BΘ)). Let Π be a prior distribution on Θ. Suppose that m ( x ) =

∫ Θ f^ θ^ ( x ) d Π^ >^ 0. (i) The posterior distribution P θ | x ≪ Π and

dP θ | x / d Π = f θ ( x )/ m ( x )

(ii) If Π ≪ λ and d Π/ d λ = π( θ ) for a σ -finite measure λ , then

dP θ | x / d λ = f θ ( x ) π( θ )/ m ( x )

Proof:

Result (ii) follows from result (i) and Proposition 1.7(iii)

logo

Proof for (i)

X

m ( x ) d ν =

X

Θ

f θ ( x ) d Π d ν =

Θ

X

f θ ( x ) d ν d Π = 1

The second equality follows from Fubini’s theorem m ( x ) is integrable w.r.t. ν and m ( x ) < ∞ a.e. ν

Because of this, m ( x ) is called the marginal p.d.f. of X w.r.t. ν

Without loss of generality we may assume m ( x ) > 0 If m ( x ) = 0 for an x ∈ X , then f θ ( x ) = 0 a.s. Π Either x should be eliminated from X or the prior Π is incorrect and a new prior should be specified

For x ∈ X with m ( x ) < ∞, define

P ( B , x ) =

m ( x )

B

f θ ( x ) d Π, B ∈ BΘ

Then P (·, x ) is a probability measure on Θ a.e. ν.

logo

Proof for (i)

X

m ( x ) d ν =

X

Θ

f θ ( x ) d Π d ν =

Θ

X

f θ ( x ) d ν d Π = 1

The second equality follows from Fubini’s theorem m ( x ) is integrable w.r.t. ν and m ( x ) < ∞ a.e. ν

Because of this, m ( x ) is called the marginal p.d.f. of X w.r.t. ν

Without loss of generality we may assume m ( x ) > 0 If m ( x ) = 0 for an x ∈ X , then f θ ( x ) = 0 a.s. Π Either x should be eliminated from X or the prior Π is incorrect and a new prior should be specified

For x ∈ X with m ( x ) < ∞, define

P ( B , x ) =

m ( x )

B

f θ ( x ) d Π, B ∈ BΘ

Then P (·, x ) is a probability measure on Θ a.e. ν.

logo

By Theorem 1.7, it remains to show that

P ( B , x ) = P (~ θ ∈ B | X = x )

By Fubini’s theorem, P ( B , ·) is a measurable function of x Let Px ,θ denote the “joint" distribution of ( X ,~ θ ) For any A ∈ σ ( X ),

A ×Θ

IB ( θ ) dPx , θ =

A

B

f θ ( x ) d ν d Π

A

[∫

B

f θ ( x ) m ( x )

d Π

] [∫

Θ

f θ ( x ) d Π

]

d ν

Θ

A

[∫

B

f θ ( x ) m ( x )

d Π

]

f θ ( x ) d ν d Π

A ×Θ

P ( B , x ) dPx , θ

where the third equality follows from Fubini’s theorem This completes the proof

logo

Discrete X and ~ θ : The Bayes formula in elementary probability

P (~ θ = θ | X = x ) =

P ( X = x |~ θ = θ ) P (~ θ = θ ) ∑ θ ∈Θ P ( X^ =^ x |~^ θ^ =^ θ^ ) P (~^ θ^ =^ θ^ )

Remarks on the Bayesian approach

The posterior P θ | x contains all the information we have about θ Statistical decisions and inference should be made based on P θ | x , conditional on the observed X = x In estimating θ , P θ | x can be viewed as a randomized decision rule under the approach discussed in §2. After X = x is observed, P θ | x is a randomized rule, which is a probability distribution on the action space A = Θ The Bayesian method can be applied iteratively