Hypothesis Testing Overview

Steps involved:

1. Pick H0 and HA. H0 always contains =. HA matches the question asked; it is what we’re trying to prove.

Remember we assume H0 is true and we’re trying to prove it false, meaning prove HA is true. Either H0 or HA must be

true, making the other false.

2. Pick an )level by determining which is more critical, a Type I error (then use  = 0.01) or a Type II error (use  =

0.10). If all else fails, use  = 0.05.  indicates how much evidence we must have to prove HA true.

3. Pick which test to use (see Flowchart and details below) and calculate the p)value.

4. State the conclusion: If the p)value < , reject H0 and conclude HA is true. If the p)value is NOT < , do NOT

reject H0 (fail to reject), and we can NOT conclude HA is true (there is insufficient evidence to prove HA is true).

Numeric data: means, averages or expected values

One mean: H0:  = # HA can be >, < or  depending on what we want to prove.

 1 sample z-test (1 on flowchart): must have normal data and KNOWN 2.

 1 sample t-test (2 on flowchart): must have normal data but 2 is unknown and we must use s2.

 If n >30, we don’t have to have normal data,

will be approx normal, so we can use a t)test (3 on flowchart).

 If n <30, we can’t use a z or a t, we would have to use a non)parametric test.

Two means: H0: 1 = 2 (1  2 = 0) HA is again >, < or .

 (Matched) paired t-test (10 on flowchart): IF the samples are dependent (there is some link between the 2 samples

by units), we create a sample of differences, which must be normal if n>30, and use a 1 sample t)test using the mean,

standard deviation and number of differences. This is the most powerful of the 3 tests here because we eliminate a

source of variability. df = ndiff  1, the number of differences minus 1.

 2 sample t-test (9 on the flowchart): we must have normal data or n>30 for both samples and independent samples.

We use s12 and s22 for 12 and 22. df = min(n11 and n21)

 pooled t-test (8 on flowchart): we must have normal data or n>30 for both samples, independent samples and the

variances must be equal. We pool s12 and s22 to get a better estimate of 2, sp2. df = (n11) + (n21)  larger degrees of

freedom, so more power than 2 sample t.

 if n1 or n2 are NOT > 30, we must use a non)parametric procedure

Multiple means: H0: 1 = 2 = . . . = k (k different populations) HA: not all the means are equal

 ANOVA F-test: normal data, independent samples and equal 2’s (same as pooled t)test). We compare the

variation between the means, the

’s, with the variation within the data (sp2 estimates 2 the true variance of the data).

F = smeans2/sp2, the larger it is the further apart the means are. df num = # of groups  1, dfdenom = total  # of groups.

Categorical data: proportions, percents and fractions

One proportion: H0:  = # HA can be >, < or  depending on what we want to prove.

 1 sample z-test (6 on flowchart): must have n and n(1)  10.

 exact binomial test (5 on flowchart): if n OR n(1) < 10, we have to do a binomial test.

Two proportions: H0: 1 = 2 (1  2 = 0) HA is again >, < or .

 2 sample z-test (11 on flowchart): must have n11, n1(11), n22, n2(12)  10, although it’s often relaxed to 5.

Multiple proportions: H0: 1 = 2 = … = k HA: not all proportions are equal } test for homogeneity OR

H0: row and column variables are independent HA: row and column variables are related } test for independence

 2 test: all cells (row/column combinations) must have a count of at least 5 (For tables larger than 2  2, we can

use the approximation whenever the average of the expected counts is 5 or more and the smallest is at least 1, IPS

p.626). The expected count within a cell, Eij = P(rowi)*P(columnj)*n, is based on the rows and columns being

independent, so the further the expected is from the observed count, the larger the 2 test statistic is, the less we

believe the null. df = (r  1)*(c  1), where r is the number of rows and c is the number of columns.

Partial preview of the text

Download Hypothesis Testing: Steps, Numeric and Categorical Data, and Different Tests - Prof. J. Ca and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Hypothesis Testing Overview

Steps involved:

Pick H 0 and HA. H 0 always contains =. HA matches the question asked; it is what we’re trying to prove. Remember we assume H 0 is true and we’re trying to prove it false, meaning prove HA is true. Either H 0 or HA must be true, making the other false.
Pick an -level by determining which is more critical, a Type I error (then use level by determining which is more critical, a Type I error (then use  = 0.01) or a Type II error (use  = 0.10). If all else fails, use  = 0.05.  indicates how much evidence we must have to prove HA true.
Pick which test to use (see Flowchart and details below) and calculate the p -level by determining which is more critical, a Type I error (then use value.
State the conclusion: If the p -level by determining which is more critical, a Type I error (then use value < , reject H 0 and conclude HA is true. If the p -level by determining which is more critical, a Type I error (then use value is NOT < , do NOT reject H 0 (fail to reject), and we can NOT conclude HA is true (there is insufficient evidence to prove HA is true). Numeric data: means, averages or expected values One mean: H 0 :  = # HA can be >, < or  depending on what we want to prove.  1 sample z -test ( 1 on flowchart): must have normal data and KNOWN ^2.  1 sample t -test ( 2 on flowchart): must have normal data but ^2 is unknown and we must use s^2.

 If n >30, we don’t have to have normal data, x^ will be approx normal, so we can use a t -level by determining which is more critical, a Type I error (then use test ( 3 on flowchart).

 If n <30, we can’t use a z or a t , we would have to use a non-level by determining which is more critical, a Type I error (then use parametric test. Two means: H 0 :  1 =  2 ( 1   2 = 0) HA is again >, < or .  (Matched) paired t -test ( 10 on flowchart): IF the samples are dependent (there is some link between the 2 samples by units), we create a sample of differences, which must be normal if n >30, and use a 1 sample t -level by determining which is more critical, a Type I error (then use test using the mean, standard deviation and number of differences. This is the most powerful of the 3 tests here because we eliminate a source of variability. df = n diff  1, the number of differences minus 1.  2 sample t -test ( 9 on the flowchart): we must have normal data or n >30 for both samples and independent samples. We use s 12 and s 22 for  12 and  22. df = min( n 1 1 and n 2 1)  pooled t -test ( 8 on flowchart): we must have normal data or n >30 for both samples, independent samples and the variances must be equal. We pool s 12 and s 22 to get a better estimate of ^2 , s p^2. df = ( n 1 1) + ( n 2 1)  larger degrees of freedom, so more power than 2 sample t.  if n 1 or n 2 are NOT > 30, we must use a non-level by determining which is more critical, a Type I error (then use parametric procedure Multiple means: H 0 :  1 =  2 =... = k (k different populations) HA: not all the means are equal  ANOVA F -test : normal data, independent samples and equal ^2 ’s (same as pooled t -level by determining which is more critical, a Type I error (then use test). We compare the

variation between the means, the x ’s, with the variation within the data ( s p^2 estimates ^2 the true variance of the data).

F = s means^2 / s p^2 , the larger it is the further apart the means are. df (^) num = # of groups  1, df denom = total  # of groups. Categorical data: proportions, percents and fractions One proportion: H 0 :  = # HA can be >, < or  depending on what we want to prove.  1 sample z -test ( 6 on flowchart): must have n  and n (1)  10.  exact binomial test ( 5 on flowchart): if n  OR n (1) < 10, we have to do a binomial test. Two proportions: H 0 :  1 =  2 ( 1   2 = 0) HA is again >, < or .  2 sample z -test ( 11 on flowchart): must have n 1  1 , n 1 (1 1 ), n 2  2 , n 2 (1 2 )  10, although it’s often relaxed to 5. Multiple proportions: H 0 :  1 =  2 = … = k HA: not all proportions are equal } test for homogeneity OR H 0 : row and column variables are independent HA: row and column variables are related } test for independence  ^2 test : all cells (row/column combinations) must have a count of at least 5 (For tables larger than 2  2, we can use the approximation whenever the average of the expected counts is 5 or more and the smallest is at least 1, IPS p.626). The expected count within a cell, Eij = P(rowi)P(columnj) n , is based on the rows and columns being independent, so the further the expected is from the observed count, the larger the ^2 test statistic is, the less we believe the null. df = ( r  1)*( c  1), where r is the number of rows and c is the number of columns.

Hypothesis Testing: Steps, Numeric and Categorical Data, and Different Tests - Prof. J. Ca, Study notes of Data Analysis & Statistical Methods

Related documents

Partial preview of the text

Download Hypothesis Testing: Steps, Numeric and Categorical Data, and Different Tests - Prof. J. Ca and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Hypothesis Testing Overview

Steps involved:

 If n >30, we don’t have to have normal data, x^ will be approx normal, so we can use a t -level by determining which is more critical, a Type I error (then use test ( 3 on flowchart).

variation between the means, the x ’s, with the variation within the data ( s p^2 estimates ^2 the true variance of the data).