Download Statistical Aspects of Machine Learning I - Study Notes | STAT 616 and more Study notes Statistics in PDF only on Docsity! Test for MVN assumption: We know that for MVN observations: (yi − µ) ′Σ−1(yi − µ0) ∼ χ 2 p. Law of Large Numbers and CLT imply: (yi − y) ′S−1(yi − y) ∼ χ 2 p, for n “large”. Graphic based on χ2 − plot: 1) Obtain ti = (yi −y) ′S−1(yi −y) and the ordered t(1) ≤ t(n). 2) Graph the pair (t(i), χ 2 i ) where χ 2 i is the 100(i − .5)/n percentile of the χ2p distribution. Plot should show a straight line for Normal observations. More formal tests based on Skewness of variates ti. More in: Mardia (Sections 1.8, 5.7) and D’Agostino and Stephens (1986) More Examples of the Multivariate General Linear Model We did: 1) Tests for µ with unknown Σ and 2) Tests for Σ with unknown µ. MANOVA (Multivariate Analysis of Variance) Why is it ANOVA, MANOVA? We’re testing means, not variability. Should it be ANOME, MANOME? Denote I normal data matrices by Yi = y′i1 y′i2 ... y′i,ni , where yij ∼ Np(µi,Σi), j = 1, ..., ni, i = 1, ..., I. 1) Main Hypothesis to Test: Equality of Means H0 : µ1 = ... = µI given Σ1 = ... = ΣI . This is Multivariate ANOVA (MANOVA) How to Test? 2) Special case of MANOVA: I = 2, LRT gives the Two sample Hotelling T 2 test. UIT also reduces to Two sample Hotelling T 2 test. So: LRT and UIT same for I = 1, 2, not in general. 3) Test for equality of Variances Recall Bartlett’s univariate test of σ21 = ... = σ 2 I : Reject the null if nln(s2) − ∑I i=1 niln(s 2 i ) is large. Multivariate Extension: H0: Σ1 = ... = ΣI . Under H0: MLE of Σi is pooled S = ∑I i=1 niSi/n. Under H1: MLE’s are Σi = Si. We have: λ(Y) = |S|n/2/(|S1| n1/2...|SI | nI/2). So: 2ln[λ(Y)] = nln|S| − ∑I i=1 niln|Si| = ∑I i=1 niln|S −1 i S| Distribution? For large samples, χ2p(p+1)(I−1)/2 Small sample adjustment: “Box’s M-Test” 4) a) Linear Constraints on the Mean In 1) We considered the test of equal means, H0 : µ1 = ... = µI . Often we want to test a relationship between means. For example, in the Iris data we saw that Versicolor 6= Setosa. There is a 3rd species of Iris, Virginica, and gene structure suggests: 3µ1 = µ3 + 2µ2. In general, Consider the hypothesis: H0 : q ∑ i=1 βiµi = µ0 Σ unknown. How to test this hypothesis? Let Si denote the unbiased estimator of Σ from sam- ple i, i = 1, ..., I. Then the test statistic is the Hotelling T 2 based on ∑q i=1 βiyi: T 2 = c( q ∑ i=1 βiyi − µ0) ′S−1( q ∑ i=1 βiyi − µ0), where S is the unbiased pooled estimator of Σ and c−1 = ∑q i=1 β 2 i /ni. Distribution: T 2 ∼ T 2(p, n − q) Critical values from the F−distribution. b) Comparing components of the mean vector E.g., Compare several competing treatments to a control or placebo. This is testing: H0 : µ1−µI , ..., µI−1− µI . This is a special case of: H0 : Aµ = c, where A is a r × p matrix and c is a given r-vector. In this case the test statistic is the Hotelling T 2 based on: [(n − r)/r](Ay − c)′(ASA′)−1(Ay − c) ∼ F (r, n − r). Lemma: Assume X1 and X2 are full rank, and their columns are linearly independent. Then, X′ 2 RX2 is nonsingular where R = I − X1(X ′ 1 X1) −1 X1 ′. Proof: Need to show that if a′X2 ′ RX2 = 0 then a = 0. But a′X′ 2 R′RX2a = a ′X′ 2 RX2a = 0 ⇒ RX2a = 0. So X2a = X1(X ′ 1 X1) −1X′ 1 X2a = X1b, say, which implies a = 0 (and b = 0). Two Stage Least Squares: Rewrite the Model as: Y = XB + E, where X = (X1,X2) and B = (B ′ 1 ,B′ 2 )′. Let B̃1 and B̃2 denote the estimators of B1 and B2 in the full model. B̂1 denotes the estimator of B1 in the submodel B2 = 0. Using Geometry we have that the projection of X1B̃1+ X2B̃2 by I − R onto the span of X1 is X1B̂1. In matri- ces: X1(X ′ 1X1) −1X′1[X1B̃1 + X2B̃2] = X1B̂1, or X1B̃1 + X1(X ′ 1X1) −1X′1X2B̃2 = X1B̂1. Thus, we have: B̃1 = B̂1 − (X ′ 1X1) −1X′1X2B̃2. Note: X2B̃2 − X1(X ′ 1X1)X ′ 1X2B̃2 + Y − XB̃ = Y − XB̂1 Now, premultiply by X2 ′, note that Y − XB̃ is or- thogonal to X2 so that [X2 ′RX2]B̃2 = X2 ′RY. Now use the Lemma to obtain: B̃2 = [X2 ′RX2] −1X2 ′RY We have: B̂1 = (X ′ 1 X1) −1X′ 1 Y when B2 = 0 and B̃1 and B̃2 as given above in the full model. Testing: First: univariate responses, ANCOVA Setup: y = X1β1 + X2β2 + e, where the columns of X1 and X2 are independent. A) Estimation: Have: β̂1 = (X ′ 1X1) −1X′1y β̃2 = [X2 ′RX2] −1X2 ′Ry β̃1 = β̂1 − (X ′ 1X1) −1X′1X2β̃2. RSS := (y − X1β̃1 − X2β̃2) ′(y − X1β̃1 − X2β̃2) = y′Ry − (X2β̃2) ′Ry B) Tests of hypotheses H1 : β̃2 = 0 Note RSSH = y ′Ry = (y − X1β̂1) ′(y − X1β̂1) and the LRT rejects H1 if RSSH − RSS is large. An application of Multivariate to Spatial Statistics Observe Z = Zij , i = 1, ..., n, j = 1, ..., p Index i= Locations, s1, ..., sn. Index j= Variables of interest Note: The rows of Z are not independent. A Main Goal: Predict (Estimate) Z0k, variable k at new location Optimal Estimator minimizes: E[Z0k − Ẑ0k] 2 Solution: Ẑ0k = E[Z0k|Z] Simpler: Assume a linear estimator: Ẑ0k = n ∑ i=1 p ∑ j=1 λijZij . Restrictions on λij ’s? Require Unbiasedness. Taking Expectations: µk = ∑n i=1 ∑p j=1 λijµj So ∑n i=1 λik = 1 and ∑n i=1 λij = 0 for j 6= k Now assume µj = 0 for j = 1, ..., p E[Z0k − Ẑ0k] 2 = E[Z20k−2 n ∑ i=1 p ∑ j=1 λijZ0kZij+ n ∑ i=1 p ∑ j=1 n ∑ i′=1 p ∑ j′=1 λijλi′j′ZijZi′j′ ] = Ckk(0, 0) − 2 ∑n i=1 ∑p j=1 λijCjk(0, i)+ ∑n i=1 ∑p j=1 ∑n i′=1 ∑p j′=1 λijλi′j′Cjj′(i, i ′) = F (λij), say. So: Minimize F (λij) subject to n ∑ i=1 λik = 1 and n ∑ i=1 λij = 0 for j 6= k How? La Grange Multipliers. Define: Gk(λij) = n ∑ i=1 λik − 1 and Gj(λij) = n ∑ i=1 λij for j 6= k, G = (G1, ..., Gp) ′ and: H(λij) = F (λij) − m ′G Take derivatives w.r.t. λij ’s and m. Get np + p equations, have np + p unknowns. Unique Solution. Simpler: p = 1: Minimize: C(0, 0) − 2 n ∑ i=1 λiC(0, i) + n ∑ i=1 n ∑ i′=1 λiλi′C(i, i ′) = F (λi, i = 1, ..., n) subject to G(λi, i = 1, ..., n) := n ∑ i=1 λi − 1 = 0.