Hypothesis Testing: Likelihood Ratio Tests & Neyman-Pearson Lemma, Study notes of Data Analysis & Statistical Methods

An overview of hypothesis testing in statistical inference, focusing on likelihood ratio tests (lrts) and the neyman-pearson lemma. Hypothesis testing is presented as a special case of decision theory, where we aim to test between two hypotheses: h0 and h1. The concept of a test, level of a test, power of a test, and simple hypothesis tests. It then introduces lrts, their motivation from a large-sample perspective, and the neyman-pearson lemma, which shows that the most powerful test at a given level is always a likelihood ratio test.

Typology: Study notes

Pre 2010

Uploaded on 10/01/2009

koofers-user-n1y
koofers-user-n1y 🇺🇸

4

(1)

10 documents

1 / 4

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STAT 210A: Theoretical Statistics Fall 2006
Lecture 20 November 7
Lecturer: Martin Wainwright Scribe: Chuohao Yeo
These scribe notes have only been mildly proofread.
20.1 Hypothesis testing
We can think of hypothesis testing as a special case of decision theory. Here, we are interested
in testing:
H0:θΘ0vs H1:θΘ1
where the parameter space is Θ = Θ0Θ1and Θ0Θ1=.
We define a test as the following map:
δ:X [0,1]
Note that this allows for randomized tests. In other words, given X=x, we would declare
H1with probability δ(x).
The level of a test δis the probability of incorrectly declaring H1when H0is true (or
θ0Θ0):
α(θ0) = Eθ0[δ(X)]
The power of a test δis the probability of correctly declaring H1when H1is true (or θ1Θ1):
β(θ1) = Eθ1[δ(X)]
Ideally, we would like to have α(θ0)0 uniformly over Θ0, and have β(θ1)1 uniformly
over Θ1.
20.1.1 Simple hypothesis tests
In simple hypothesis tests, we want to decide between the following alternatives:
H0:θ=θ0vs H1:θ=θ1
In other words, Θ0={θ0}and Θ1={θ1}. Furthermore, we also have the following:
α=Eθ0[δ(X)] E0[δ(X)]
β=Eθ1[δ(X)] E1[δ(X)]
In one formulation of this problem, we might want to specify the maximum tolerance of
a type I error, ¯α. Then, we would be interested in the following optimization problem:
max
δβ(δ)
s.t. α(δ)¯α
20-1
pf3
pf4

Partial preview of the text

Download Hypothesis Testing: Likelihood Ratio Tests & Neyman-Pearson Lemma and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

STAT 210A: Theoretical Statistics Fall 2006

Lecture 20 — November 7

Lecturer: Martin Wainwright Scribe: Chuohao Yeo

 These scribe notes have only been mildly proofread.

20.1 Hypothesis testing

We can think of hypothesis testing as a special case of decision theory. Here, we are interested in testing: H 0 : θ ∈ Θ 0 vs H 1 : θ ∈ Θ 1

where the parameter space is Θ = Θ 0 ∪ Θ 1 and Θ 0 ∩ Θ 1 = ∅. We define a test as the following map: δ : X → [0, 1]

Note that this allows for randomized tests. In other words, given X = x, we would declare H 1 with probability δ(x). The level of a test δ is the probability of incorrectly declaring H 1 when H 0 is true (or θ 0 ∈ Θ 0 ): α(θ 0 ) = Eθ 0 [δ(X)]

The power of a test δ is the probability of correctly declaring H 1 when H 1 is true (or θ 1 ∈ Θ 1 ):

β(θ 1 ) = Eθ 1 [δ(X)]

Ideally, we would like to have α(θ 0 ) ≈ 0 uniformly over Θ 0 , and have β(θ 1 ) ≈ 1 uniformly over Θ 1.

20.1.1 Simple hypothesis tests

In simple hypothesis tests, we want to decide between the following alternatives:

H 0 : θ = θ 0 vs H 1 : θ = θ 1

In other words, Θ 0 = {θ 0 } and Θ 1 = {θ 1 }. Furthermore, we also have the following:

α = Eθ 0 [δ(X)] ≡ E 0 [δ(X)] β = Eθ 1 [δ(X)] ≡ E 1 [δ(X)] In one formulation of this problem, we might want to specify the maximum tolerance of a type I error, ¯α. Then, we would be interested in the following optimization problem:

max δ

β(δ)

s.t. α(δ) ≤ α¯

20.1.2 Geometry of simple hypothesis tests

Consider the following set:

S = {(α, β) ∈ [0, 1]^2 |α = E 0 [δ(X)], β = E 1 [δ(X)] for some δ}

Since we allow for randomized tests, the set S is always convex.

20.2 Likelihood ratio tests

A likelihood ratio test (LRT) is specified by a threshold t ∈ [0, +∞] and takes the form:

δt(X) =

1 , if P (X; θ 1 ) > tP (X; θ 0 ) γ , if P (X; θ 1 ) = tP (X; θ 0 ) 0 , if P (X; θ 1 ) < tP (X; θ 0 )

Here is a motivation for why we might use a LRT from a large-sample perspective. We know that the likelihood ratio is:

L(X) =

P (X; θ 1 ) P (X; θ 0 )

Say X 1 , · · · , Xn are drawn i.i.d. from P (·; θ 1 ). Consider the following statistic:

Zn =

n

log L(X) =

n

∑^ n

i=

log

P (Xi; θ 1 ) P (Xi; θ 0 )

From WLLN, we have the following result:

Zn −→P Eθ 1

[

log

P (Xi; θ 1 ) P (Xi; θ 0 )

]

= D(P (·; θ 1 )‖P (·; θ 0 )) > 0

Checking, we see that,

E 0 [δt 0 (X)] = P 0

P 1 (X) P 0 (X) > t^0

= α(t 0 ) + ¯α − α(t 0 ) = α¯

(ii) Say δ is a LRT of level ¯α. Let φ be any other test of level ¯α. We need to show that E 1 [δ(X)] ≥ E 1 [φ(X)]. Define the following sets: S+^ = {x|δ(x) − φ(x) > 0 } S−^ = {x|δ(x) − φ(x) < 0 }

On S+, we must have δ(X) > 0 ⇒ P P^10 ((XX)) ≥ t. Similarly, on S−, we must have δ(X) < 1 ⇒ P 1 (X) ≤ tP 0 (X). Hence, ∫ X (δ(x)^ −^ φ(x)) (P^1 (x)^ −^ tP^0 (x))^ dx =

S+^ (δ(x)^ −^ φ(x)) (P^1 (x)^ −^ tP^0 (x))^ dx^ +^

S−^ (δ(x)^ −^ φ(x)) (P^1 (x)^ −^ tP^0 (x))^ dx ≥ 0 + 0 = 0 However, we note that: ∫ X (δ(x)^ −^ φ(x)) (P^1 (x)^ −^ tP^0 (x))^ dx^ =^ E^1 [δ(X)^ −^ φ(X)]^ −^ tE^0 [δ(X)^ −^ φ(X)]^ ≥^0 ⇒ E 1 [δ(X) − φ(X)] ≥ t(¯α − E 0 [φ(X)]) ≥ 0 with the last inequality following since both δ and φ are of level ¯α (i.e. E 0 [δ(X)] = ¯α and E 0 [φ(X)] ≤ α¯) by assumption.

(iii) Say φ is MP at level ¯α. From (ii), we can find a LRT δt that is also MP at level ¯α. Define T =

S+^ ∪ S−

∩ {x|P 1 (x) 6 = tP 0 (x)} We note that φ and δt differ on the set (S+^ ∪ S−), and P 1 (X) − tP 0 (X) 6 = 0 on the set {x|P 1 (x) 6 = tP 0 (x)}. Let f (x) = (δt(x) − φ(x)) (P 1 (x) − tP 0 (x)). On the set T , f (x) > 0 (since P 1 (x) > tP 0 (x) ⇒ δt(x) = 1 ⇒ δt(x) − φ(x) > 0 if φ and δt differs; similarly for the case if P 1 (x) < tP 0 (x)). Hence, if the set T does not have zero measure, then

T f^ (x)dx >^ 0. Now, we have that ∫

T

f (x)dx =

S+∪S−

f (x)dx

since f (x) = 0 for x ∈ {x|P 1 (x) = tP 0 (x)}. But if

S+∪S−^ f^ (x)dx >^ 0, it would contradict the assumption that δt is MP (from the proof of (ii), this integral evaluates to E 1 [δt(X) − φ(X)] − tE 0 [δt(X) − φ(X)] = E 1 [δ(X) − φ(X)], since both δt and φ are of level ¯α.) Hence, the desired result follows.