Docsity
Docsity

Prepare for your exams
Prepare for your exams

Study with the several resources on Docsity


Earn points to download
Earn points to download

Earn points by helping other students or get them with a premium plan


Guidelines and tips
Guidelines and tips

Statistical Aspects of Machine Learning I - Study Notes | STAT 616, Study notes of Statistics

Material Type: Notes; Professor: Sherman; Class: STAT ASPECTS OF MACH LEARN I; Subject: STATISTICS; University: Texas A&M University; Term: Unknown 1989;

Typology: Study notes

Pre 2010

Uploaded on 02/10/2009

koofers-user-0m1
koofers-user-0m1 🇺🇸

10 documents

1 / 19

Toggle sidebar

Related documents


Partial preview of the text

Download Statistical Aspects of Machine Learning I - Study Notes | STAT 616 and more Study notes Statistics in PDF only on Docsity! Test for MVN assumption: We know that for MVN observations: (yi − µ) ′Σ−1(yi − µ0) ∼ χ 2 p. Law of Large Numbers and CLT imply: (yi − y) ′S−1(yi − y) ∼ χ 2 p, for n “large”. Graphic based on χ2 − plot: 1) Obtain ti = (yi −y) ′S−1(yi −y) and the ordered t(1) ≤ t(n). 2) Graph the pair (t(i), χ 2 i ) where χ 2 i is the 100(i − .5)/n percentile of the χ2p distribution. Plot should show a straight line for Normal observations. More formal tests based on Skewness of variates ti. More in: Mardia (Sections 1.8, 5.7) and D’Agostino and Stephens (1986) More Examples of the Multivariate General Linear Model We did: 1) Tests for µ with unknown Σ and 2) Tests for Σ with unknown µ. MANOVA (Multivariate Analysis of Variance) Why is it ANOVA, MANOVA? We’re testing means, not variability. Should it be ANOME, MANOME? Denote I normal data matrices by Yi =     y′i1 y′i2 ... y′i,ni     , where yij ∼ Np(µi,Σi), j = 1, ..., ni, i = 1, ..., I. 1) Main Hypothesis to Test: Equality of Means H0 : µ1 = ... = µI given Σ1 = ... = ΣI . This is Multivariate ANOVA (MANOVA) How to Test? 2) Special case of MANOVA: I = 2, LRT gives the Two sample Hotelling T 2 test. UIT also reduces to Two sample Hotelling T 2 test. So: LRT and UIT same for I = 1, 2, not in general. 3) Test for equality of Variances Recall Bartlett’s univariate test of σ21 = ... = σ 2 I : Reject the null if nln(s2) − ∑I i=1 niln(s 2 i ) is large. Multivariate Extension: H0: Σ1 = ... = ΣI . Under H0: MLE of Σi is pooled S = ∑I i=1 niSi/n. Under H1: MLE’s are Σi = Si. We have: λ(Y) = |S|n/2/(|S1| n1/2...|SI | nI/2). So: 2ln[λ(Y)] = nln|S| − ∑I i=1 niln|Si| = ∑I i=1 niln|S −1 i S| Distribution? For large samples, χ2p(p+1)(I−1)/2 Small sample adjustment: “Box’s M-Test” 4) a) Linear Constraints on the Mean In 1) We considered the test of equal means, H0 : µ1 = ... = µI . Often we want to test a relationship between means. For example, in the Iris data we saw that Versicolor 6= Setosa. There is a 3rd species of Iris, Virginica, and gene structure suggests: 3µ1 = µ3 + 2µ2. In general, Consider the hypothesis: H0 : q ∑ i=1 βiµi = µ0 Σ unknown. How to test this hypothesis? Let Si denote the unbiased estimator of Σ from sam- ple i, i = 1, ..., I. Then the test statistic is the Hotelling T 2 based on ∑q i=1 βiyi: T 2 = c( q ∑ i=1 βiyi − µ0) ′S−1( q ∑ i=1 βiyi − µ0), where S is the unbiased pooled estimator of Σ and c−1 = ∑q i=1 β 2 i /ni. Distribution: T 2 ∼ T 2(p, n − q) Critical values from the F−distribution. b) Comparing components of the mean vector E.g., Compare several competing treatments to a control or placebo. This is testing: H0 : µ1−µI , ..., µI−1− µI . This is a special case of: H0 : Aµ = c, where A is a r × p matrix and c is a given r-vector. In this case the test statistic is the Hotelling T 2 based on: [(n − r)/r](Ay − c)′(ASA′)−1(Ay − c) ∼ F (r, n − r). Lemma: Assume X1 and X2 are full rank, and their columns are linearly independent. Then, X′ 2 RX2 is nonsingular where R = I − X1(X ′ 1 X1) −1 X1 ′. Proof: Need to show that if a′X2 ′ RX2 = 0 then a = 0. But a′X′ 2 R′RX2a = a ′X′ 2 RX2a = 0 ⇒ RX2a = 0. So X2a = X1(X ′ 1 X1) −1X′ 1 X2a = X1b, say, which implies a = 0 (and b = 0). Two Stage Least Squares: Rewrite the Model as: Y = XB + E, where X = (X1,X2) and B = (B ′ 1 ,B′ 2 )′. Let B̃1 and B̃2 denote the estimators of B1 and B2 in the full model. B̂1 denotes the estimator of B1 in the submodel B2 = 0. Using Geometry we have that the projection of X1B̃1+ X2B̃2 by I − R onto the span of X1 is X1B̂1. In matri- ces: X1(X ′ 1X1) −1X′1[X1B̃1 + X2B̃2] = X1B̂1, or X1B̃1 + X1(X ′ 1X1) −1X′1X2B̃2 = X1B̂1. Thus, we have: B̃1 = B̂1 − (X ′ 1X1) −1X′1X2B̃2. Note: X2B̃2 − X1(X ′ 1X1)X ′ 1X2B̃2 + Y − XB̃ = Y − XB̂1 Now, premultiply by X2 ′, note that Y − XB̃ is or- thogonal to X2 so that [X2 ′RX2]B̃2 = X2 ′RY. Now use the Lemma to obtain: B̃2 = [X2 ′RX2] −1X2 ′RY We have: B̂1 = (X ′ 1 X1) −1X′ 1 Y when B2 = 0 and B̃1 and B̃2 as given above in the full model. Testing: First: univariate responses, ANCOVA Setup: y = X1β1 + X2β2 + e, where the columns of X1 and X2 are independent. A) Estimation: Have: β̂1 = (X ′ 1X1) −1X′1y β̃2 = [X2 ′RX2] −1X2 ′Ry β̃1 = β̂1 − (X ′ 1X1) −1X′1X2β̃2. RSS := (y − X1β̃1 − X2β̃2) ′(y − X1β̃1 − X2β̃2) = y′Ry − (X2β̃2) ′Ry B) Tests of hypotheses H1 : β̃2 = 0 Note RSSH = y ′Ry = (y − X1β̂1) ′(y − X1β̂1) and the LRT rejects H1 if RSSH − RSS is large. An application of Multivariate to Spatial Statistics Observe Z = Zij , i = 1, ..., n, j = 1, ..., p Index i= Locations, s1, ..., sn. Index j= Variables of interest Note: The rows of Z are not independent. A Main Goal: Predict (Estimate) Z0k, variable k at new location Optimal Estimator minimizes: E[Z0k − Ẑ0k] 2 Solution: Ẑ0k = E[Z0k|Z] Simpler: Assume a linear estimator: Ẑ0k = n ∑ i=1 p ∑ j=1 λijZij . Restrictions on λij ’s? Require Unbiasedness. Taking Expectations: µk = ∑n i=1 ∑p j=1 λijµj So ∑n i=1 λik = 1 and ∑n i=1 λij = 0 for j 6= k Now assume µj = 0 for j = 1, ..., p E[Z0k − Ẑ0k] 2 = E[Z20k−2 n ∑ i=1 p ∑ j=1 λijZ0kZij+ n ∑ i=1 p ∑ j=1 n ∑ i′=1 p ∑ j′=1 λijλi′j′ZijZi′j′ ] = Ckk(0, 0) − 2 ∑n i=1 ∑p j=1 λijCjk(0, i)+ ∑n i=1 ∑p j=1 ∑n i′=1 ∑p j′=1 λijλi′j′Cjj′(i, i ′) = F (λij), say. So: Minimize F (λij) subject to n ∑ i=1 λik = 1 and n ∑ i=1 λij = 0 for j 6= k How? La Grange Multipliers. Define: Gk(λij) = n ∑ i=1 λik − 1 and Gj(λij) = n ∑ i=1 λij for j 6= k, G = (G1, ..., Gp) ′ and: H(λij) = F (λij) − m ′G Take derivatives w.r.t. λij ’s and m. Get np + p equations, have np + p unknowns. Unique Solution. Simpler: p = 1: Minimize: C(0, 0) − 2 n ∑ i=1 λiC(0, i) + n ∑ i=1 n ∑ i′=1 λiλi′C(i, i ′) = F (λi, i = 1, ..., n) subject to G(λi, i = 1, ..., n) := n ∑ i=1 λi − 1 = 0.