Conditional Expectations - Lecture Slides | STAT 709, Study notes of Mathematical Statistics

Material Type: Notes; Professor: Shao; Class: Mathematical Statistics; Subject: STATISTICS; University: University of Wisconsin - Madison; Term: Fall 2009;

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-51g-1
koofers-user-51g-1 🇺🇸

5

(1)

10 documents

1 / 17

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
logo
Stat 709: Mathematical Statistics
Lecture 8
Jun Shao
Department of Statistics
University of Wisconsin
Madison, WI 53706, USA
Jun Shao (UW-Madison) Stat 709 Lecture 8 September 21, 2009 1 / 11
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Conditional Expectations - Lecture Slides | STAT 709 and more Study notes Mathematical Statistics in PDF only on Docsity!

logo

Stat 709: Mathematical Statistics

Lecture 8

Jun Shao

Department of Statistics University of Wisconsin Madison, WI 53706, USA

logo

Lecture 8: Conditional expectation

In elementry probability, conditional probability P(B|A) is defined as P(B|A) = P(A ∩ B)/P(A) for events A and B with P(A) > 0. For two random variables, X and Y , how do we define P(X ∈ B|Y = y)?

Definition 1.

Let X be an integrable random variable on (Ω, F , P). (i) The conditional expectation of X given A (a sub- σ -field of F ), denoted by E(X |A ), is the a.s.-unique random variable satisfying the following two conditions: (a) E(X |A ) is measurable from (Ω, A ) to (R, B); (b)

∫ A E(X^ |A^ )dP^ =^

∫ A XdP^ for any^ A^ ∈^ A^. (ii) The conditional probability of B ∈ F given A is defined to be P(B|A ) = E(IB |A ). (iii) Let Y be measurable from (Ω, F , P) to (Λ, G ). The conditional expectation of X given Y is defined to be E(X |Y ) = E[X | σ (Y )].

logo

Remarks

The existence of E(X |A ) follows from Theorem 1.4. σ (Y ) contains “the information in Y " E(X |Y ) is the “expectation” of X given the information in Y For a random vector X , E(X |A ) is defined as the vector of conditional expectations of components of X.

Lemma 1.

Let Y be measurable from (Ω, F ) to (Λ, G ) and Z a function from (Ω, F ) to Rk^. Then Z is measurable from (Ω, σ (Y )) to (Rk^ , Bk^ ) iff there is a measurable function h from (Λ, G ) to (Rk^ , Bk^ ) such that Z = h ◦ Y.

By Lemma 1.2, there is a Borel function h on (Λ, G ) such that E(X |Y ) = h ◦ Y. For y ∈ Λ, we define E(X |Y = y) = h(y) to be the conditional expectation of X given Y = y. h(y) is a function on Λ, whereas h ◦ Y = E(X |Y ) is a function on Ω.

logo

Remarks

The existence of E(X |A ) follows from Theorem 1.4. σ (Y ) contains “the information in Y " E(X |Y ) is the “expectation” of X given the information in Y For a random vector X , E(X |A ) is defined as the vector of conditional expectations of components of X.

Lemma 1.

Let Y be measurable from (Ω, F ) to (Λ, G ) and Z a function from (Ω, F ) to Rk^. Then Z is measurable from (Ω, σ (Y )) to (Rk^ , Bk^ ) iff there is a measurable function h from (Λ, G ) to (Rk^ , Bk^ ) such that Z = h ◦ Y.

By Lemma 1.2, there is a Borel function h on (Λ, G ) such that E(X |Y ) = h ◦ Y. For y ∈ Λ, we define E(X |Y = y) = h(y) to be the conditional expectation of X given Y = y. h(y) is a function on Λ, whereas h ◦ Y = E(X |Y ) is a function on Ω.

logo

Example 1.

Let X be an integrable random variable on (Ω, F , P), A 1 , A 2 , ... be disjoint events on (Ω, F , P) such that ∪Ai = Ω and P(Ai ) > 0 for all i, and let a 1 , a 2 , ... be distinct real numbers. Define Y = a 1 IA 1 + a 2 IA 2 + · · ·. We now show that

E(X |Y ) =

i= 1

∫ Ai XdP P(Ai )

IAi.

We need to verify (a) and (b) in Definition 1.6 with A = σ (Y ).

Since σ (Y ) = σ ({A 1 , A 2 , ...}), it is clear that the function on the right-hand side is measurable on (Ω, σ (Y )). This verifies (a).

To verify (b), we need to show

Y −^1 (B)

XdP =

Y −^1 (B)

[

i= 1

∫ Ai XdP P(Ai )

IAi

]

dP.

for any B ∈ B,

logo

Example 1.

Let X be an integrable random variable on (Ω, F , P), A 1 , A 2 , ... be disjoint events on (Ω, F , P) such that ∪Ai = Ω and P(Ai ) > 0 for all i, and let a 1 , a 2 , ... be distinct real numbers. Define Y = a 1 IA 1 + a 2 IA 2 + · · ·. We now show that

E(X |Y ) =

i= 1

∫ Ai XdP P(Ai )

IAi.

We need to verify (a) and (b) in Definition 1.6 with A = σ (Y ).

Since σ (Y ) = σ ({A 1 , A 2 , ...}), it is clear that the function on the right-hand side is measurable on (Ω, σ (Y )). This verifies (a).

To verify (b), we need to show

Y −^1 (B)

XdP =

Y −^1 (B)

[

i= 1

∫ Ai XdP P(Ai )

IAi

]

dP.

for any B ∈ B,

logo

Example 1.21 (continued)

Using the fact that Y −^1 (B) = ∪i:ai ∈BAi , we obtain

Y −^1 (B)

XdP = ∑

i:ai ∈B

Ai

XdP

i= 1

∫ Ai XdP P(Ai )

P

Ai ∩ Y −^1 (B)

Y −^1 (B)

[

i= 1

∫ Ai XdP P(Ai )

IAi

]

dP,

where the last equality follows from Fubini’s theorem. This verifies (b) and thus the result. Let h be a Borel function on R satisfying

h(ai ) =

Ai

XdP/P(Ai ).

Then E(X |Y ) = h ◦ Y and E(X |Y = y) = h(y).

logo

Proposition 1.

Let X be a random n-vector and Y a random m-vector. Suppose that (X , Y ) has a joint p.d.f. f (x, y) w.r.t. ν × λ , where ν and λ are σ -finite measures on (Rn, Bn) and (Rm, Bm), respectively. Let g(x, y) be a Borel function on Rn+m^ for which E|g(X , Y )| < ∞. Then

E[g(X , Y )|Y ] =

∫ g( ∫x , Y )f (x, Y )d ν(x) f (x, Y )d ν(x)

a.s.

Proof

Denote the right-hand side by h(Y ). By Fubini’s theorem, h is Borel. Then, by Lemma 1.2, h(Y ) is Borel on (Ω, σ (Y )). Also, by Fubini’s theorem,

fY (y) =

∫ f (x, y)d ν(x)

is the p.d.f. of Y w.r.t. λ.

logo

Proof (continued)

For B ∈ Bm, ∫

Y −^1 (B)

h(Y )dP =

B

h(y)dPY

B

∫ g( ∫x , y)f (x, y)d ν(x) f (x, y)d ν(x)

fY (y)d λ (y)

=

Rn^ ×B

g(x, y)f (x, y)d ν × λ

=

Rn^ ×B

g(x, y)dP(X ,Y )

=

Y −^1 (B)

g(X , Y )dP,

where the first and the last equalities follow from Theorem 1.2, the second and the next to last equalities follow from the definition of h and p.d.f.’s, and the third equality follows from Fubini’s theorem.

logo

Conditional p.d.f.

Let (X , Y ) be a random vector with a joint p.d.f. f (x, y) w.r.t. ν × λ The conditional p.d.f. of X given Y = y is defined to be

fX |Y (x|y) = f (x, y)/fY (y)

where fY (y) =

∫ f (x, y)d ν(x)

is the marginal p.d.f. of Y w.r.t. λ. For each fixed y with fY (y) > 0, fX |Y (x|y) is a p.d.f. w.r.t. ν. Then Proposition 1.9 states that

E[g(X , Y )|Y ] =

∫ g(x, Y )fX |Y (x|Y )d ν(x)

i.e., the conditional expectation of g(X , Y ) given Y is equal to the expectation of g(X , Y ) w.r.t. the conditional p.d.f. of X given Y.

logo

Example 1.

Let X be a random variable on (Ω, F , P) with EX 2 < ∞ and let Y be a measurable function from (Ω, F , P) to (Λ, G ). One may wish to predict the value of X based on an observed value of Y. Let g(Y ) be a predictor, i.e.,

g ∈ ℵ = {all Borel functions g with E[g(Y )]^2 < ∞}.

Each predictor is assessed by the “mean squared prediction error"

E[X − g(Y )]^2.

We now show that E(X |Y ) is the best predictor of X in the sense that

E[X − E(X |Y )]^2 = min g∈ℵ E[X − g(Y )]^2.

First, Proposition 1.10(viii) implies E(X |Y ) ∈ ℵ.

logo

Example 1.22 (continued)

Next, for any g ∈ ℵ,

E[X − g(Y )]^2 =E[X − E(X |Y ) + E(X |Y ) − g(Y )]^2 =E[X − E(X |Y )]^2 + E[E(X |Y ) − g(Y )]^2

  • 2 E{[X − E(X |Y )][E(X |Y ) − g(Y )]} =E[X − E(X |Y )]^2 + E[E(X |Y ) − g(Y )]^2
  • 2 E

E{[X − E(X |Y )][E(X |Y ) − g(Y )]|Y }

=E[X − E(X |Y )]^2 + E[E(X |Y ) − g(Y )]^2

  • 2 E{[E(X |Y ) − g(Y )]E[X − E(X |Y )|Y ]} =E[X − E(X |Y )]^2 + E[E(X |Y ) − g(Y )]^2 ≥E[X − E(X |Y )]^2 ,

where the third equality follows from Proposition 1.10(iv), the fourth equality follows from Proposition 1.10(vi), and the last equality follows from Proposition 1.10(i), (iii), and (vi).