





Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An introduction to contingency tables, which are used to analyze the relationship between two categorical variables. The concepts of independence and homogeneity, and includes examples and instructions for performing a chi-square test of independence. The document also discusses the assumptions and interpretation of the test results.
Typology: Study notes
1 / 9
This page cannot be seen from the preview
Don't miss anything!






1
Today: Sections 11-3; 10-1/10-
Assignment: 11-3 {1, 3, 7, 9, 13, 17} 10-2 {1, 3, 5, 7, 9, 13, 17, 19, 23} Next: Sections 10-2/10-
Instructor: John Burke
E-mail: [email protected] Web Page: http://math.sierracollege.edu/Staff/JohnBurke/
Telephone: 916 337-
Office hours: (V-307) MW 2:35-5:00; M 2:45-3:45 (official)
2
A contingency table (or two-way frequency table ) is a table in which frequencies correspond to two variables. (One variable is used to categorize rows, and a second variable is used to categorize columns.)
In these cases we are looking to determine a dependency relationship between the row variable and column variable, though it is important to note that dependency does NOT establish causality.
4
A test of independence tests the null hypothesis
H 0 : There is no association between the row variable and the column variable; i.e. the row and column variables are independent. H 1 : The variables in question are not independent.
χ^2 Test for Independence (assumptions)
5
Test Statistic :
Critical Values :
2
Relationships Among Components in Independence Hypothesis Test
Compare the observed (O) values to the corresponding expected (E) values.
Small X^2 value means large p-value Large X^2 value means small p-value
Fail to reject independence
Reject independence
O s and E s are far apart
O s and E s are close
10
In a test of homogeneity , we test the claim that different populations have the same proportions of some characteristic.
Example : Male survey respondents to an abortion rights question categorized as (Agree, Disagree) by (Male Interviewer and Female Interviewer).
11
Example : Male survey respondents to an abortion rights question categorized as (Agree, Disagree) by (Male Interviewer and Female Interviewer).
Men who disagree
Men who agree 240 92
560 308
Man Women
In Chapter 10, we examine relationships between paired quantitative data.
We use collected data to
13
Can We Predict the Time of the Next Eruption of Old Faithful?
Is there a relationship between any two variables?
Can we predict how long it will be to the next eruption based upon duration, interval before, or height?
Height (L 4 )* 140 110 125 120 140 120 125 150
Interval After Eruption (L 3 )* 92 65 72 94 83 94 101 87
Interval Before Eruption (L 2 )* 98 90 92 98 93 105 81 108
Duration (L 1 )* 240 120 178 234 235 269 255 220
Eruptions of the Old Faithful Geyser
14
Paired sample data is sometimes called bivariate data.
A correlation exists between two variables when one of them is related to the other in some way.
We can often see if a relationship exists by using a scatterplot (or scatter diagram ), a graph in which the paired (x, y) sample data are plotted with each pair represented as a single point.
Assumptions : we will consider only linear relationships, which means that when graphed, the points approximate a straight line. (Recall slope and direction of line.)
x x
y y y
x (b) Strong positive
(c) Perfect positive
(a) Positive
19
StatDisk: Analysis Æ Correlation and Regression
Interval After (L 3 ) vs. Duration (L 1 )
Interval After (L 3 ) vs. Height (L 4 )
Interval After (L 3 ) vs. Interval Before (L 2 )
20
The ( Pearson ) correlation coefficient r measures the strength of the linear relationship between the paired x- and y-quantitative values in a sample.
Assumptions The sample of paired data is a random sample. The pairs of (x, y) data have a bivariate normal distribution.
2 2 2 2
n xy x y r n x x n y y
n = number of pairs of data presented Σ denotes the addition of the items indicated. Σ x denotes the sum of all x values. Σ x^2 indicates that each x score should be squared and then those squares added. ( Σ x )^2 indicates that the x scores should be added and the total then squared. Σ xy indicates that each x score should be first multiplied by its corresponding y score. After obtaining all such products, find their sum. r represents the linear correlation coefficient for a sample ρ (rho) represents the linear correlation coefficient for a population
2 2 2 2
n xy x y r n x x n y y
22
The value of r does not change if all values of either variable are converted to a different scale. The value of r is not affected by the choice of x or y. Interchange all x- and y- values and the value of r will not change. r measures the strength of a linear relationship. It is not designed to measure the strength of a relationship that is not linear. r^2 is the proportion of the variation in y that is explained by the linear relationship between x and y.
The value of r is always between -1 and +1 inclusive.
2 2 2 2
n xy x y r n x x n y y
23
Interpreting r using Table A-6 :
If the absolute value of the computed value of r exceeds the value in Table A-6, conclude that there is a significant linear correlation.
Otherwise, there is not sufficient evidence to support the conclusion of a significant linear correlation.
4 (^56) (^78) 9 (^1011) (^1213) 14 (^1516) (^1718) 19 (^2025) (^3035) 40 (^4550) (^6070) 80 10090
n . .959. .875. . .765. .708. . .641. .606. . .561. .463. . .378. .330. . .269.
. .878. .754. . .632. .576. . .514. .482. . .444. .361. . .294. .254. . .207.
α = .05^ α^ =.
Causation : It is wrong to conclude that correlation implies causality (Remember eating lobster and its “effect” on pregnancy).
Averages : Averages suppress individual variation and may inflate the correlation coefficient.
Linearity : There may be some relationship between x and y even when there is no significant linear correlation.