Hypothesis Testing: Steps, Numeric and Categorical Data, and Different Tests - Prof. J. Ca, Study notes of Data Analysis & Statistical Methods

The steps involved in hypothesis testing, including choosing h0 and ha, determining the significance level, and selecting the appropriate test based on the type and distribution of data. It covers tests for one and two means, proportions, and homogeneity or independence of proportions. Tests include z-tests, t-tests, anova f-test, and χ² tests.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-p8v
koofers-user-p8v 🇺🇸

10 documents

1 / 1

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Hypothesis Testing Overview
Steps involved:
1. Pick H0 and HA. H0 always contains =. HA matches the question asked; it is what we’re trying to prove.
Remember we assume H0 is true and we’re trying to prove it false, meaning prove HA is true. Either H0 or HA must be
true, making the other false.
2. Pick an )level by determining which is more critical, a Type I error (then use = 0.01) or a Type II error (use =
0.10). If all else fails, use = 0.05. indicates how much evidence we must have to prove HA true.
3. Pick which test to use (see Flowchart and details below) and calculate the p)value.
4. State the conclusion: If the p)value < , reject H0 and conclude HA is true. If the p)value is NOT < , do NOT
reject H0 (fail to reject), and we can NOT conclude HA is true (there is insufficient evidence to prove HA is true).
Numeric data: means, averages or expected values
One mean: H0: = # HA can be >, < or depending on what we want to prove.
1 sample z-test (1 on flowchart): must have normal data and KNOWN 2.
1 sample t-test (2 on flowchart): must have normal data but 2 is unknown and we must use s2.
If n >30, we don’t have to have normal data,
x
will be approx normal, so we can use a t)test (3 on flowchart).
If n <30, we can’t use a z or a t, we would have to use a non)parametric test.
Two means: H0: 1 = 2 (1 2 = 0) HA is again >, < or .
(Matched) paired t-test (10 on flowchart): IF the samples are dependent (there is some link between the 2 samples
by units), we create a sample of differences, which must be normal if n>30, and use a 1 sample t)test using the mean,
standard deviation and number of differences. This is the most powerful of the 3 tests here because we eliminate a
source of variability. df = ndiff 1, the number of differences minus 1.
2 sample t-test (9 on the flowchart): we must have normal data or n>30 for both samples and independent samples.
We use s12 and s22 for 12 and 22. df = min(n11 and n21)
pooled t-test (8 on flowchart): we must have normal data or n>30 for both samples, independent samples and the
variances must be equal. We pool s12 and s22 to get a better estimate of 2, sp2. df = (n11) + (n21) larger degrees of
freedom, so more power than 2 sample t.
if n1 or n2 are NOT > 30, we must use a non)parametric procedure
Multiple means: H0: 1 = 2 = . . . = k (k different populations) HA: not all the means are equal
ANOVA F-test: normal data, independent samples and equal 2’s (same as pooled t)test). We compare the
variation between the means, the
x
’s, with the variation within the data (sp2 estimates 2 the true variance of the data).
F = smeans2/sp2, the larger it is the further apart the means are. df num = # of groups 1, dfdenom = total # of groups.
Categorical data: proportions, percents and fractions
One proportion: H0: = # HA can be >, < or depending on what we want to prove.
1 sample z-test (6 on flowchart): must have n and n(1) 10.
exact binomial test (5 on flowchart): if n OR n(1) < 10, we have to do a binomial test.
Two proportions: H0: 1 = 2 (1 2 = 0) HA is again >, < or .
2 sample z-test (11 on flowchart): must have n11, n1(11), n22, n2(12) 10, although it’s often relaxed to 5.
Multiple proportions: H0: 1 = 2 = … = k HA: not all proportions are equal } test for homogeneity OR
H0: row and column variables are independent HA: row and column variables are related } test for independence
2 test: all cells (row/column combinations) must have a count of at least 5 (For tables larger than 2 2, we can
use the approximation whenever the average of the expected counts is 5 or more and the smallest is at least 1, IPS
p.626). The expected count within a cell, Eij = P(rowi)*P(columnj)*n, is based on the rows and columns being
independent, so the further the expected is from the observed count, the larger the 2 test statistic is, the less we
believe the null. df = (r 1)*(c 1), where r is the number of rows and c is the number of columns.

Partial preview of the text

Download Hypothesis Testing: Steps, Numeric and Categorical Data, and Different Tests - Prof. J. Ca and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Hypothesis Testing Overview

Steps involved:

  1. Pick H 0 and HA. H 0 always contains =. HA matches the question asked; it is what we’re trying to prove. Remember we assume H 0 is true and we’re trying to prove it false, meaning prove HA is true. Either H 0 or HA must be true, making the other false.
  2. Pick an -level by determining which is more critical, a Type I error (then use level by determining which is more critical, a Type I error (then use  = 0.01) or a Type II error (use  = 0.10). If all else fails, use  = 0.05.  indicates how much evidence we must have to prove HA true.
  3. Pick which test to use (see Flowchart and details below) and calculate the p -level by determining which is more critical, a Type I error (then use value.
  4. State the conclusion: If the p -level by determining which is more critical, a Type I error (then use value < , reject H 0 and conclude HA is true. If the p -level by determining which is more critical, a Type I error (then use value is NOT < , do NOT reject H 0 (fail to reject), and we can NOT conclude HA is true (there is insufficient evidence to prove HA is true). Numeric data: means, averages or expected values One mean: H 0 :  = # HA can be >, < or  depending on what we want to prove.  1 sample z -test ( 1 on flowchart): must have normal data and KNOWN ^2.  1 sample t -test ( 2 on flowchart): must have normal data but ^2 is unknown and we must use s^2.

 If n >30, we don’t have to have normal data, x^ will be approx normal, so we can use a t -level by determining which is more critical, a Type I error (then use test ( 3 on flowchart).

 If n <30, we can’t use a z or a t , we would have to use a non-level by determining which is more critical, a Type I error (then use parametric test. Two means: H 0 :  1 =  2 ( 1   2 = 0) HA is again >, < or .  (Matched) paired t -test ( 10 on flowchart): IF the samples are dependent (there is some link between the 2 samples by units), we create a sample of differences, which must be normal if n >30, and use a 1 sample t -level by determining which is more critical, a Type I error (then use test using the mean, standard deviation and number of differences. This is the most powerful of the 3 tests here because we eliminate a source of variability. df = n diff  1, the number of differences minus 1.  2 sample t -test ( 9 on the flowchart): we must have normal data or n >30 for both samples and independent samples. We use s 12 and s 22 for  12 and  22. df = min( n 1 1 and n 2 1)  pooled t -test ( 8 on flowchart): we must have normal data or n >30 for both samples, independent samples and the variances must be equal. We pool s 12 and s 22 to get a better estimate of ^2 , s p^2. df = ( n 1 1) + ( n 2 1)  larger degrees of freedom, so more power than 2 sample t.  if n 1 or n 2 are NOT > 30, we must use a non-level by determining which is more critical, a Type I error (then use parametric procedure Multiple means: H 0 :  1 =  2 =... = k (k different populations) HA: not all the means are equal  ANOVA F -test : normal data, independent samples and equal ^2 ’s (same as pooled t -level by determining which is more critical, a Type I error (then use test). We compare the

variation between the means, the x ’s, with the variation within the data ( s p^2 estimates ^2 the true variance of the data).

F = s means^2 / s p^2 , the larger it is the further apart the means are. df (^) num = # of groups  1, df denom = total  # of groups. Categorical data: proportions, percents and fractions One proportion: H 0 :  = # HA can be >, < or  depending on what we want to prove.  1 sample z -test ( 6 on flowchart): must have n  and n (1)  10.  exact binomial test ( 5 on flowchart): if n  OR n (1) < 10, we have to do a binomial test. Two proportions: H 0 :  1 =  2 ( 1   2 = 0) HA is again >, < or .  2 sample z -test ( 11 on flowchart): must have n 1  1 , n 1 (1 1 ), n 2  2 , n 2 (1 2 )  10, although it’s often relaxed to 5. Multiple proportions: H 0 :  1 =  2 = … = k HA: not all proportions are equal } test for homogeneity OR H 0 : row and column variables are independent HA: row and column variables are related } test for independence  ^2 test : all cells (row/column combinations) must have a count of at least 5 (For tables larger than 2  2, we can use the approximation whenever the average of the expected counts is 5 or more and the smallest is at least 1, IPS p.626). The expected count within a cell, Eij = P(rowi)P(columnj) n , is based on the rows and columns being independent, so the further the expected is from the observed count, the larger the ^2 test statistic is, the less we believe the null. df = ( r  1)*( c  1), where r is the number of rows and c is the number of columns.