Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Linear Regression: Health Spending vs. GDP in OECD Countries - Prof. Mary Kathryn Cowles, Study notes of Statistics

University of Iowa (UI)Statistics

Prof. Mary Kathryn Cowles

A lecture note from a bayesian statistics course (22s:138) at an unspecified university, focusing on linear regression analysis. An introduction to the topic, reviews the frequentist approach, and presents the relationship between per capita health spending and per capita gross domestic product (gdp) in 24 oecd countries in 1989. It covers concepts such as scatterplots, linear functions, error terms, and calculating predicted values and residuals.

Typology: Study notes

Pre 2010

Uploaded on 09/17/2009

koofers-user-ux9 🇺🇸

10 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

Introduction to Linear Regression

22S:138 Bayesian Statistics

Lecture 14

October 16, 2006

Kate Cowles, Ph.D.

1

Review of Frequentist Approach to Linear

Regression

Per Capita Health Spending and Per Capita Gross

Domestic Product (GDP) in 24 OECD Countries,

1989

Schieber, Poullier, and Greenwald, Health Affairs, 1991

Country Per Cap Hlth Per Cap GDP

1. united states 2051 18.1429

2. canada 1483 17.2857

3. iceland 1241 15.5714

4. sweden 1233 13.8571

5. switzerland 1225 15.8571

6. norway 1149 15.5714

7. france 1105 12.2857

8. germany 1093 13.4286

9. luxemborg 1050 14.8571

10. netherlands 1041 13.0000

11. austria 982 11.8571

12. finland 949 12.8571

13. australia 939 12.2857

14. japan 915 13.4286

15. belgium 879 11.8571

16. italy 841 12.4286

17. denmark 792 13.5714

18. united kingdom 758 12.4286

19. new zealand 733 10.8571

20. ireland 561 7.8571

21. spain 521 8.8571

22. portugal 386 6.5714

23. greece 337 6.4286

24. turkey 148 4.4286

2

In regression analysis, we look at the conditional distribution

of the response variable at different levels of a predictor

variable

•Response variable

–also called “dependent” or “outcome” variable

–what we want to explain or predict

–in simple linear regression, response variable is continu-

ous

•Predictor variables

–also called ”independent” variables or ”covariates”

–in simple linear regression, predictor variable usually is

also continuous

–How we define which variable is response and which is

predictor depends on our research question.

3

Per Capita Health Spending and Per Capita Gross

Domestic Product (GDP) In 24 OECD Countries,

1989

4

Discover Study notes of Statistics University of Iowa (UI)

Partial preview of the text

Download Linear Regression: Health Spending vs. GDP in OECD Countries - Prof. Mary Kathryn Cowles and more Study notes Statistics in PDF only on Docsity!

Introduction to Linear Regression

22S:138 Bayesian Statistics

Lecture 14

October 16, 2006

Kate Cowles, Ph.D.

1

Review of Frequentist Approach to Linear

Regression

Per Capita Health Spending and Per Capita Gross

Domestic Product (GDP) in 24 OECD Countries,

Schieber, Poullier, and Greenwald, Health Affairs, 1991

Country Per Cap Hlth Per Cap GDP

united states 2051 18.
canada 1483 17.
iceland 1241 15.
sweden 1233 13.
switzerland 1225 15.
norway 1149 15.
france 1105 12.
germany 1093 13.
luxemborg 1050 14.
netherlands 1041 13.
austria 982 11.
finland 949 12.
australia 939 12.
japan 915 13.
belgium 879 11.
italy 841 12.
denmark 792 13.
united kingdom 758 12.
new zealand 733 10.
ireland 561 7.
spain 521 8.
portugal 386 6.
greece 337 6.
turkey 148 4.

2

In regression analysis, we look at the conditional distribution

of the response variable at different levels of a predictor

variable

• Response variable

– also called “dependent” or “outcome” variable

– what we want to explain or predict

– in simple linear regression, response variable is continu-

ous

• Predictor variables

– also called ”independent” variables or ”covariates”

– in simple linear regression, predictor variable usually is

also continuous

– How we define which variable is response and which is

predictor depends on our research question.

Per Capita Health Spending and Per Capita Gross

Domestic Product (GDP) In 24 OECD Countries,

Scatterplots

response variable on Y axis
predictor variable on X axis
Relationship in this scatterplot looks roughly linear.
- Makes sense to try to summarize the relationship be-

tween these two variables with a straight line.

5

Quick review of linear functions

Y = β 0 + β 1 X

Y is a response variable that is a linear function of the

predictor variable X

β 0 : intercept; the value of Y when X = 0
β 1 : slope; how much Y changes when X increases by 1 unit

6

Linear regression

In linear regression analysis, β 0 +β 1 X represents the mean

value of all the Y’s for a given value of X.

E(Y |X) = β 0 + β 1 X

There is an entire distribution of Y values for each value of

X (a conditional distribution)

Example: for any given value of per capita GDP, there

is a distribution of values of per capita health spending

among OECD countries

We say the relationship between X and Y is linear if the

means of the conditional distributions of Y |X lie on a straight

line.

Error terms

In regression, we represent factors other than Xi that affect

Yi with an error term, i.

population model

i = Yi − (β 0 + β 1 Xi)

i = Yi − E[Yi]

or, equivalently,

Yi = (β 0 + β 1 Xi) + epsiloni

Yi = E[Yi] + epsiloni

Calculating predicted values and residuals

Per capita health expenditures and per capita GDP

Dep Var Predict Std Err Lower95% Upper95% Obs NAME PCH Value Predict Mean Mean

1 UnitedStates 2051.0 1558.9 63.075 1428.1 1689. 2 Canada 1483.0 1467.0 56.251 1350.3 1583. 3 Iceland 1241.0 1283.1 43.858 1192.1 1374. 4 Sweden 1233.0 1099.2 34.641 1027.4 1171. 5 Switzerland 1225.0 1313.7 45.765 1218.8 1408. 6 Norway 1149.0 1283.1 43.858 1192.1 1374. 7 France 1105.0 930.6 31.480 865.4 995. 8 Germany 1093.0 1053.2 33.165 984.5 1122. 9 Luxemborg 1050.0 1206.5 39.487 1124.6 1288. 10 Netherland 1041.0 1007.3 32.127 940.6 1073. 11 Austria 982.0 884.7 31.771 818.8 950. 12 Finland 949.0 991.9 31.886 925.8 1058. 13 Australia 939.0 930.6 31.480 865.4 995. 14 Japan 915.0 1053.2 33.165 984.5 1122. 15 Belgium 879.0 884.7 31.771 818.8 950. 16 Italy 841.0 946.0 31.497 880.6 1011. 17 Denmark 792.0 1068.5 33.611 998.8 1138. 18 UnitedKingdom 758.0 946.0 31.497 880.6 1011. 19 NewZealand 733.0 777.4 34.322 706.2 848. 20 Ireland 561.0 455.6 52.341 347.1 564. 21 Spain 521.0 562.9 45.201 469.1 656. 22 Portugal 386.0 317.7 62.399 188.3 447. 23 Greece 337.0 302.4 63.559 170.6 434. 24 Turkey 148.0 87.8628 80.394 -78.8646 254.

13

Obs NAME Residual

1 UnitedStates 492. 2 Canada 16. 3 Iceland -42. 4 Sweden 133. 5 Switzerland -88. 6 Norway -134. 7 France 174. 8 Germany 39. 9 Luxemborg -156. 10 Netherland 33. 11 Austria 97. 12 Finland -42. 13 Australia 8. 14 Japan -138. 15 Belgium -5. 16 Italy -105. 17 Denmark -276. 18 UnitedKingdom -188. 19 NewZealand -44. 20 Ireland 105. 21 Spain -41. 22 Portugal 68. 23 Greece 34. 24 Turkey 60.

14

Estimating the common variance

One of the assumptions of linear regression is that the vari-

ance for each of the conditional distributions of Y |X is the

same at all values of X.

The estimate of this common variance is

SSE

n − 2

analogous to estimate of variance in a normal sample
n − 2 in denominator is “degrees of freedom”
- number of observations minus number of estimated re-

gression coefficients

Inferences for the Slope

So far, we’ve been describing the relationship between two

continuous variables.

Now we want to perform a hypothesis test to determine

whether there is a linear relationship between the two vari-

ables.

depends on assumptions of linear regression
Question: Does the value of Y depend linearly on X?

E[Yi] = β 0 + β 1 Xi

Answer: Yes, unless β 1 = 0, in which case

E[Yi] = β 0

Hypotheses for test for linear relationship between Y and X

H 0 : β 1 = 0

HA : β 1 6 = 0

• Test statistic

t =

where ˆσβ 1 = √∑ σˆ

(Xi− X¯)^2

– standard form of test statistic: estimate divided by its

standard error

– standard error of βˆ 1 depends on

∗ variability of Ys

∗ how closely clustered the Xs are

– follows a t distribution with n - 2 degrees of freedom

∗ because we have to estimate 2 parameters (β 0 and

β 1 ) to compute ˆσ

– p-value: the probability of obtaining a t statistic as ex-

treme as, or more extreme than, what we got, if H 0 is

true

17

• Confidence interval for the slope:

– A (1 - α)% confidence interval for the true slope β 1 is

given by:

βˆ 1 ± (t

1 −(α/2),df =n− 2 )(ˆσβ 1 )

– If this C.I. includes the value 0, we cannot reject the null

hypothesis at significance level α.

18

Interpreting the test for zero slope

• Failure to reject H 0 : β 1 = 0

– Type II error

– X and Y related in a nonlinear way

– X provides little help in predicting Y

• Rejecting H 0 : β 1 = 0

– X provides significant information for predicting Y

– Although the data fit a linear model, some nonlinear

model may do even better

• Important caveat regarding inferences on β 1 : the best straight

line may be terrible!

Inferences concerning the regression

line

• Estimating the mean of the Y’s for a par-

ticular value of X, say X 0

– Example: what is the average per capita

health spending for a country with per

capita gross domestic product 10 PPP

E[Y |X 0 ] = YˆX

= βˆ 0 + βˆ 1 X 0

• estimated standard error of E[Y |X 0 ]

YX

√√ √√ √√ √√ √

n

(X 0 − X¯)^2 )

∑

Linear Regression: Health Spending vs. GDP in OECD Countries - Prof. Mary Kathryn Cowles, Study notes of Statistics

Related documents

Partial preview of the text

Download Linear Regression: Health Spending vs. GDP in OECD Countries - Prof. Mary Kathryn Cowles and more Study notes Statistics in PDF only on Docsity!

Introduction to Linear Regression

22S:138 Bayesian Statistics

Lecture 14

October 16, 2006

Kate Cowles, Ph.D.

Review of Frequentist Approach to Linear

Regression

Per Capita Health Spending and Per Capita Gross

Domestic Product (GDP) in 24 OECD Countries,

In regression analysis, we look at the conditional distribution

of the response variable at different levels of a predictor

variable

• Response variable

– also called “dependent” or “outcome” variable

– what we want to explain or predict

– in simple linear regression, response variable is continu-

ous

• Predictor variables

– also called ”independent” variables or ”covariates”

– in simple linear regression, predictor variable usually is

also continuous

– How we define which variable is response and which is

predictor depends on our research question.

Per Capita Health Spending and Per Capita Gross

Domestic Product (GDP) In 24 OECD Countries,

Scatterplots

tween these two variables with a straight line.

Quick review of linear functions

Y = β 0 + β 1 X

predictor variable X

Linear regression

value of all the Y’s for a given value of X.

E(Y |X) = β 0 + β 1 X

X (a conditional distribution)

is a distribution of values of per capita health spending

among OECD countries

means of the conditional distributions of Y |X lie on a straight

line.

Error terms

Yi with an error term, i.

i = Yi − (β 0 + β 1 Xi)

i = Yi − E[Yi]

Yi = (β 0 + β 1 Xi) + epsiloni

Yi = E[Yi] + epsiloni

Calculating predicted values and residuals

Per capita health expenditures and per capita GDP

Estimating the common variance

ance for each of the conditional distributions of Y |X is the

same at all values of X.

SSE

n − 2

gression coefficients

Inferences for the Slope

continuous variables.

whether there is a linear relationship between the two vari-

ables.

E[Yi] = β 0 + β 1 Xi

E[Yi] = β 0

H 0 : β 1 = 0

HA : β 1 6 = 0

• Test statistic

t =

where ˆσβ 1 = √∑ σˆ

– standard form of test statistic: estimate divided by its

standard error

– standard error of βˆ 1 depends on

∗ variability of Ys

∗ how closely clustered the Xs are

– follows a t distribution with n - 2 degrees of freedom

∗ because we have to estimate 2 parameters (β 0 and

β 1 ) to compute ˆσ

– p-value: the probability of obtaining a t statistic as ex-

treme as, or more extreme than, what we got, if H 0 is

true

• Confidence interval for the slope:

– A (1 - α)% confidence interval for the true slope β 1 is

Yi with an error term, i.

i = Yi − (β 0 + β 1 Xi)

i = Yi − E[Yi]