


Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An overview of hypothesis testing in statistical inference, focusing on likelihood ratio tests (lrts) and the neyman-pearson lemma. Hypothesis testing is presented as a special case of decision theory, where we aim to test between two hypotheses: h0 and h1. The concept of a test, level of a test, power of a test, and simple hypothesis tests. It then introduces lrts, their motivation from a large-sample perspective, and the neyman-pearson lemma, which shows that the most powerful test at a given level is always a likelihood ratio test.
Typology: Study notes
1 / 4
This page cannot be seen from the preview
Don't miss anything!



STAT 210A: Theoretical Statistics Fall 2006
Lecturer: Martin Wainwright Scribe: Chuohao Yeo
These scribe notes have only been mildly proofread.
We can think of hypothesis testing as a special case of decision theory. Here, we are interested in testing: H 0 : θ ∈ Θ 0 vs H 1 : θ ∈ Θ 1
where the parameter space is Θ = Θ 0 ∪ Θ 1 and Θ 0 ∩ Θ 1 = ∅. We define a test as the following map: δ : X → [0, 1]
Note that this allows for randomized tests. In other words, given X = x, we would declare H 1 with probability δ(x). The level of a test δ is the probability of incorrectly declaring H 1 when H 0 is true (or θ 0 ∈ Θ 0 ): α(θ 0 ) = Eθ 0 [δ(X)]
The power of a test δ is the probability of correctly declaring H 1 when H 1 is true (or θ 1 ∈ Θ 1 ):
β(θ 1 ) = Eθ 1 [δ(X)]
Ideally, we would like to have α(θ 0 ) ≈ 0 uniformly over Θ 0 , and have β(θ 1 ) ≈ 1 uniformly over Θ 1.
In simple hypothesis tests, we want to decide between the following alternatives:
H 0 : θ = θ 0 vs H 1 : θ = θ 1
In other words, Θ 0 = {θ 0 } and Θ 1 = {θ 1 }. Furthermore, we also have the following:
α = Eθ 0 [δ(X)] ≡ E 0 [δ(X)] β = Eθ 1 [δ(X)] ≡ E 1 [δ(X)] In one formulation of this problem, we might want to specify the maximum tolerance of a type I error, ¯α. Then, we would be interested in the following optimization problem:
max δ
β(δ)
s.t. α(δ) ≤ α¯
Consider the following set:
S = {(α, β) ∈ [0, 1]^2 |α = E 0 [δ(X)], β = E 1 [δ(X)] for some δ}
Since we allow for randomized tests, the set S is always convex.
A likelihood ratio test (LRT) is specified by a threshold t ∈ [0, +∞] and takes the form:
δt(X) =
1 , if P (X; θ 1 ) > tP (X; θ 0 ) γ , if P (X; θ 1 ) = tP (X; θ 0 ) 0 , if P (X; θ 1 ) < tP (X; θ 0 )
Here is a motivation for why we might use a LRT from a large-sample perspective. We know that the likelihood ratio is:
L(X) =
P (X; θ 1 ) P (X; θ 0 )
Say X 1 , · · · , Xn are drawn i.i.d. from P (·; θ 1 ). Consider the following statistic:
Zn =
n
log L(X) =
n
∑^ n
i=
log
P (Xi; θ 1 ) P (Xi; θ 0 )
From WLLN, we have the following result:
Zn −→P Eθ 1
log
P (Xi; θ 1 ) P (Xi; θ 0 )
= D(P (·; θ 1 )‖P (·; θ 0 )) > 0
Checking, we see that,
E 0 [δt 0 (X)] = P 0
P 1 (X) P 0 (X) > t^0
= α(t 0 ) + ¯α − α(t 0 ) = α¯
(ii) Say δ is a LRT of level ¯α. Let φ be any other test of level ¯α. We need to show that E 1 [δ(X)] ≥ E 1 [φ(X)]. Define the following sets: S+^ = {x|δ(x) − φ(x) > 0 } S−^ = {x|δ(x) − φ(x) < 0 }
On S+, we must have δ(X) > 0 ⇒ P P^10 ((XX)) ≥ t. Similarly, on S−, we must have δ(X) < 1 ⇒ P 1 (X) ≤ tP 0 (X). Hence, ∫ X (δ(x)^ −^ φ(x)) (P^1 (x)^ −^ tP^0 (x))^ dx =
S+^ (δ(x)^ −^ φ(x)) (P^1 (x)^ −^ tP^0 (x))^ dx^ +^
S−^ (δ(x)^ −^ φ(x)) (P^1 (x)^ −^ tP^0 (x))^ dx ≥ 0 + 0 = 0 However, we note that: ∫ X (δ(x)^ −^ φ(x)) (P^1 (x)^ −^ tP^0 (x))^ dx^ =^ E^1 [δ(X)^ −^ φ(X)]^ −^ tE^0 [δ(X)^ −^ φ(X)]^ ≥^0 ⇒ E 1 [δ(X) − φ(X)] ≥ t(¯α − E 0 [φ(X)]) ≥ 0 with the last inequality following since both δ and φ are of level ¯α (i.e. E 0 [δ(X)] = ¯α and E 0 [φ(X)] ≤ α¯) by assumption.
(iii) Say φ is MP at level ¯α. From (ii), we can find a LRT δt that is also MP at level ¯α. Define T =
∩ {x|P 1 (x) 6 = tP 0 (x)} We note that φ and δt differ on the set (S+^ ∪ S−), and P 1 (X) − tP 0 (X) 6 = 0 on the set {x|P 1 (x) 6 = tP 0 (x)}. Let f (x) = (δt(x) − φ(x)) (P 1 (x) − tP 0 (x)). On the set T , f (x) > 0 (since P 1 (x) > tP 0 (x) ⇒ δt(x) = 1 ⇒ δt(x) − φ(x) > 0 if φ and δt differs; similarly for the case if P 1 (x) < tP 0 (x)). Hence, if the set T does not have zero measure, then
T f^ (x)dx >^ 0. Now, we have that ∫
T
f (x)dx =
S+∪S−
f (x)dx
since f (x) = 0 for x ∈ {x|P 1 (x) = tP 0 (x)}. But if
S+∪S−^ f^ (x)dx >^ 0, it would contradict the assumption that δt is MP (from the proof of (ii), this integral evaluates to E 1 [δt(X) − φ(X)] − tE 0 [δt(X) − φ(X)] = E 1 [δ(X) − φ(X)], since both δt and φ are of level ¯α.) Hence, the desired result follows.