Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Contingency Tables: Independence and Homogeneity - Prof. John Burke, Study notes of Statistics

Sierra College Statistics

Prof. John Burke

An introduction to contingency tables, which are used to analyze the relationship between two categorical variables. The concepts of independence and homogeneity, and includes examples and instructions for performing a chi-square test of independence. The document also discusses the assumptions and interpretation of the test results.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-isj 🇺🇸

10 documents

1 / 9

This page cannot be seen from the preview

Don't miss anything!

1

Sierra College – Math 13

Spring 2009 – Class 30/32

Today: Sections 11-3; 10-1/10-2

Assignment: 11-3 {1, 3, 7, 9, 13, 17}

10-2 {1, 3, 5, 7, 9, 13, 17, 19, 23}

Next: Sections 10-2/10-3

Instructor: John Burke

E-mail: [email protected]

Web Page: http://math.sierracollege.edu/Staff/JohnBurke/

Telephone: 916 337-0425

Office hours: (V-307) MW 2:35-5:00; M 2:45-3:45 (official)

2

11-3 Contingency Tables:

Independence and Homogeneity

A contingency table (or two-way frequency table) is a

table in which frequencies corres pond to two variables. (One

variable is used to categorize rows, and a second variable is

used to categorize columns.)

–Example: Titanic passengers categorized as (Survived,

Died) by (Men, Women, Boys, and Girls).

–Example: Male survey respondents to an abortion rights

question categorized as (Agree, Disagree) by (Male

Interviewer and Female Interviewer).

3

–Example: Students categorized as (Male and

Female) by (Non-Smoker, Smoker).

–Example: Respondents to a question about use of

the TV remote categorized as (Male and F emale)

by (Often, Sometimes, or Almost Never).

In these cases we are looking to det ermine a

dependency relationship between the row variable

and column variable, though it is important to note that

dependency does NOT establish causality.

11-3 Contingency Tables:

Independence and Homogeneity

Discover Study notes of Statistics Sierra College

Partial preview of the text

Download Contingency Tables: Independence and Homogeneity - Prof. John Burke and more Study notes Statistics in PDF only on Docsity!

1

Sierra College – Math 13

Spring 2009 – Class 30/

Today: Sections 11-3; 10-1/10-

Assignment: 11-3 {1, 3, 7, 9, 13, 17} 10-2 {1, 3, 5, 7, 9, 13, 17, 19, 23} Next: Sections 10-2/10-

Instructor: John Burke

E-mail: [email protected] Web Page: http://math.sierracollege.edu/Staff/JohnBurke/

Telephone: 916 337-

Office hours: (V-307) MW 2:35-5:00; M 2:45-3:45 (official)

2

11-3 Contingency Tables:

Independence and Homogeneity

A contingency table (or two-way frequency table ) is a table in which frequencies correspond to two variables. (One variable is used to categorize rows, and a second variable is used to categorize columns.)

Example : Titanic passengers categorized as (Survived, Died) by (Men, Women, Boys, and Girls).
Example : Male survey respondents to an abortion rights question categorized as (Agree, Disagree) by (Male Interviewer and Female Interviewer). - Example : Students categorized as (Male and Female) by (Non-Smoker, Smoker). - Example : Respondents to a question about use of the TV remote categorized as (Male and Female) by (Often, Sometimes, or Almost Never).

In these cases we are looking to determine a dependency relationship between the row variable and column variable, though it is important to note that dependency does NOT establish causality.

11-3 Contingency Tables:

Independence and Homogeneity

4

11-3 Contingency Tables:

Independence

A test of independence tests the null hypothesis

H 0 : There is no association between the row variable and the column variable; i.e. the row and column variables are independent. H 1 : The variables in question are not independent.

χ^2 Test for Independence (assumptions)

The sample data are randomly selected.
For every cell, the expected frequency is at least 5.

5

χ^2 Test for Independence

Test Statistic :

Critical Values :

The critical values are found in Table A-4 by using degrees of freedom = (r – 1)(c – 1) , where r is the number of rows and c is the number of columns.
In a test of independence with a contingency table, the critical region is located in the right tail only.

2

2 (^ O^ E )

E

Relationships Among Components in Independence Hypothesis Test

Compare the observed (O) values to the corresponding expected (E) values.

Small X^2 value means large p-value Large X^2 value means small p-value

Fail to reject independence

Reject independence

O s and E s are far apart

O s and E s are close

10

In a test of homogeneity , we test the claim that different populations have the same proportions of some characteristic.

Example : Male survey respondents to an abortion rights question categorized as (Agree, Disagree) by (Male Interviewer and Female Interviewer).

χ^2 Test for Homogeneity

11

χ^2 Test for Homogeneity

Example : Male survey respondents to an abortion rights question categorized as (Agree, Disagree) by (Male Interviewer and Female Interviewer).

Use a 0.05 significance level.
H 0 : The proportions of agree/disagree responses are the same for the subjects interviewed by men and the subjects interviewed by women.
H 1 : The proportions are different.

Men who disagree

Men who agree 240 92

560 308

Man Women

10-1 / 10-3 Correlation and

Regression

In Chapter 10, we examine relationships between paired quantitative data.

We use collected data to

Observe a pattern (correlation – 10-2)
Mathematically model the pattern (regression – 10-3)
When appropriate, use the mathematical model to make predictions.

13

Chapter 10 Problem:

Can We Predict the Time of the Next Eruption of Old Faithful?

Is there a relationship between any two variables?

Can we predict how long it will be to the next eruption based upon duration, interval before, or height?

Height (L 4 )* 140 110 125 120 140 120 125 150

Interval After Eruption (L 3 )* 92 65 72 94 83 94 101 87

Interval Before Eruption (L 2 )* 98 90 92 98 93 105 81 108

Duration (L 1 )* 240 120 178 234 235 269 255 220

Eruptions of the Old Faithful Geyser

Enter the data in your calculator/StatDisk

14

10-2 Correlation

Paired sample data is sometimes called bivariate data.

A correlation exists between two variables when one of them is related to the other in some way.

We can often see if a relationship exists by using a scatterplot (or scatter diagram ), a graph in which the paired (x, y) sample data are plotted with each pair represented as a single point.

Assumptions : we will consider only linear relationships, which means that when graphed, the points approximate a straight line. (Recall slope and direction of line.)

Positive Linear Correlation

x x

y y y

x (b) Strong positive

(c) Perfect positive

(a) Positive

19

Scatter Plots for the Chapter Problem

StatDisk: Analysis Æ Correlation and Regression

Interval After (L 3 ) vs. Duration (L 1 )

Interval After (L 3 ) vs. Height (L 4 )

Interval After (L 3 ) vs. Interval Before (L 2 )

20

Linear Correlation Coefficient

The ( Pearson ) correlation coefficient r measures the strength of the linear relationship between the paired x- and y-quantitative values in a sample.

Assumptions The sample of paired data is a random sample. The pairs of (x, y) data have a bivariate normal distribution.

2 2 2 2

n xy x y r n x x n y y

Notation for r

n = number of pairs of data presented Σ denotes the addition of the items indicated. Σ x denotes the sum of all x values. Σ x^2 indicates that each x score should be squared and then those squares added. ( Σ x )^2 indicates that the x scores should be added and the total then squared. Σ xy indicates that each x score should be first multiplied by its corresponding y score. After obtaining all such products, find their sum. r represents the linear correlation coefficient for a sample ρ (rho) represents the linear correlation coefficient for a population

2 2 2 2

n xy x y r n x x n y y

22

Properties of r

The value of r does not change if all values of either variable are converted to a different scale. The value of r is not affected by the choice of x or y. Interchange all x- and y- values and the value of r will not change. r measures the strength of a linear relationship. It is not designed to measure the strength of a relationship that is not linear. r^2 is the proportion of the variation in y that is explained by the linear relationship between x and y.

The value of r is always between -1 and +1 inclusive.

2 2 2 2

n xy x y r n x x n y y

23

Table A-

Interpreting r using Table A-6 :

If the absolute value of the computed value of r exceeds the value in Table A-6, conclude that there is a significant linear correlation.

Otherwise, there is not sufficient evidence to support the conclusion of a significant linear correlation.

4 (^56) (^78) 9 (^1011) (^1213) 14 (^1516) (^1718) 19 (^2025) (^3035) 40 (^4550) (^6070) 80 10090

n . .959. .875. . .765. .708. . .641. .606. . .561. .463. . .378. .330. . .269.

. .878. .754. . .632. .576. . .514. .482. . .444. .361. . .294. .254. . .207.

α = .05^ α^ =.

Common Errors Involving Correlation

Causation : It is wrong to conclude that correlation implies causality (Remember eating lobster and its “effect” on pregnancy).

Averages : Averages suppress individual variation and may inflate the correlation coefficient.

Linearity : There may be some relationship between x and y even when there is no significant linear correlation.

Contingency Tables: Independence and Homogeneity - Prof. John Burke, Study notes of Statistics

Related documents

Partial preview of the text

Download Contingency Tables: Independence and Homogeneity - Prof. John Burke and more Study notes Statistics in PDF only on Docsity!

Sierra College – Math 13

Spring 2009 – Class 30/

11-3 Contingency Tables:

Independence and Homogeneity

11-3 Contingency Tables:

Independence and Homogeneity

11-3 Contingency Tables:

Independence

χ^2 Test for Independence

2 (^ O^ E )

E

χ^2 Test for Homogeneity

χ^2 Test for Homogeneity

10-1 / 10-3 Correlation and

Regression

Chapter 10 Problem:

10-2 Correlation

Positive Linear Correlation

Scatter Plots for the Chapter Problem

Linear Correlation Coefficient

Notation for r

Properties of r

Table A-

Common Errors Involving Correlation