Linear Regression Analysis: Blood Pressure and Temperature Correlation - Prof. Mary Hartz, Study notes of Data Analysis & Statistical Methods

A lecture note from a university course, sta100, on regression and correlation analysis. It covers the concepts of linear regression, straight line regression, and the use of regression to predict men's body temperature based on their systolic blood pressure. Examples of data from observational studies and designed experiments, as well as formulas for finding the best slope and intercept for a given set of data. The lecture also discusses the concept of errors and the least squares approach for minimizing these errors.

Typology: Study notes

Pre 2010

Uploaded on 08/09/2009

koofers-user-rme-1
koofers-user-rme-1 🇺🇸

10 documents

1 / 10

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
STA100 Lecture 22
1 | P a g e
Regression and Correlation
Text sections 10.1, 10.2, 10.3
Here’s an example to get us started. It is adapted from a classic textbook (Moore & McCabe, 1993),
and contains data about the earliest fossil bird, Archaeopteryx lithographica(Houck, Gauthier, &
Strauss, 1990 ).
Archaeopteryx is an extinct beast having flight feathers like a bird but teeth and a long bony tail like a
reptile. Only six fossil specimens are known. Because these specimens differ greatly in size, they
have sometimes been classified as different species, rather than as individuals from the same
species. Correlation can help decide the question. If the specimens belong to the same species and
differ in size because they are at different stages of growth, there should be a strong straight line
relationship between the lengths of a pair of bones from all individuals. Outliers from this
relationship would suggest a different species. Below are the lengths in millimeters of the femur (a leg
bone-thigh) and the humerus (a bone in the upper arm- shoulder to elbow) for the five specimens that
preserve both bones.
It makes sense that if I know how long your humerus is I might be able to predict your femur length.
Similarly, if I know how large your fist is I can decently predict how large your heart is (most sources
I’ve come across say that it is the same size as your fist, but some say it is the size of a fist for a kid,
twice the fist size for an adult). The point is that there is a pretty regular relationship between fist
size and heart size. Similarly, if I know how well you did on your SAT college entrance exam I can
probably reasonably accurately predict how well you would have done on the ACT college entrance
exam.
Let’s get back to our birds. Here’s the data. Lengths are in millimeters.
Femur
38
56
59
64
74
Humerus
41
63
70
72
84
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download Linear Regression Analysis: Blood Pressure and Temperature Correlation - Prof. Mary Hartz and more Study notes Data Analysis & Statistical Methods in PDF only on Docsity!

Regression and Correlation Text sections 10.1, 10.2, 10.

Here’s an example to get us started. It is adapted from a classic textbook (Moore & McCabe, 1993), and contains data about the earliest fossil bird, Archaeopteryx lithographica(Houck, Gauthier, & Strauss, 1990 ).

Archaeopteryx is an extinct beast having flight feathers like a bird but teeth and a long bony tail like a reptile. Only six fossil specimens are known. Because these specimens differ greatly in size, they have sometimes been classified as different species, rather than as individuals from the same species. Correlation can help decide the question. If the specimens belong to the same species and differ in size because they are at different stages of growth, there should be a strong straight line relationship between the lengths of a pair of bones from all individuals. Outliers from this relationship would suggest a different species. Below are the lengths in millimeters of the femur (a leg bone-thigh) and the humerus (a bone in the upper arm- shoulder to elbow) for the five specimens that preserve both bones.

It makes sense that if I know how long your humerus is I might be able to predict your femur length. Similarly, if I know how large your fist is I can decently predict how large your heart is (most sources I’ve come across say that it is the same size as your fist, but some say it is the size of a fist for a kid, twice the fist size for an adult). The point is that there is a pretty regular relationship between fist size and heart size. Similarly, if I know how well you did on your SAT college entrance exam I can probably reasonably accurately predict how well you would have done on the ACT college entrance exam.

Let’s get back to our birds. Here’s the data. Lengths are in millimeters.

Femur 38 56 59 64 74 Humerus 41 63 70 72 84

My first instinct is to plot the data to get a quick visual feel. I’ve superimposed a straight line.

Recall that we plot (𝑥, 𝑦) or (𝑙𝑒𝑓𝑡/𝑟𝑖𝑔𝑕𝑡, 𝑢𝑝/𝑑𝑜𝑤𝑛). As you can see, the ordered pairs (femur, humerus) when plotted point by point fall pretty nearly along a straight line. When this happens our data can be handled by what is called “linear regression”, in particular “straight line regression”.

We have three tasks in this chapter.

  1. Given data in the form of ordered pairs, can we learn how to fit a straight line to the data?
  2. Can we decide whether this is a smart thing to do?
  3. Can we perform tests about our regressions?

There are two contexts in which regression often occurs: designed experiments and observational studies. Recall that Mary sent us some observational data on blood pressure and body temperature.

Men Women systolic diastolic temperature systolic diastolic temperature 1 109 71 36.6 1 138 70 36. 2 115 77 36.7 2 145 81 36. 3 113 53 36.2 3 121 83 36. 4 124 74 36.7 4 134 85 37. 5 123 74 36.3 5 144 78 36. 6 130 78 36.7 6 109 72 36 7 113 66 36.5 7 146 84 36. 8 111 73 36.5 8 108 62 37. 9 160 97 36.9 9 120 68 37. 10 139 99 37.1 10 123 75 36.

0 20 40 60 80

0

10

20

30

40

50

60

70

80

90

femur

humerus

y = 1.2*x - 3.

A couple of comments are in order at this point. You’ll notice that all of the “action” on the graph occurs in the upper right. I would usually restrict the area of the graph to the area where the data are located. Instead I’m showing explicitly where the “𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡” is. Make sure that you see that when the 𝑥 value is zero, the line intersects the y axis down at -9.9. Also, think this through to observe that these values are not physical. Even if you can conceive of a zero systolic BP for a corpse, you should realize that -9.9 on the diastolic is just not reasonable. The 𝑦 − 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡 helps us to draw the graph , it is not necessarily within the realm of what is possible.

Take a moment and plot the BP data for the women on the axes below. Try to draw a straight line in “by hand” that captures the data as best you can. Note that no straight line can pass through all of the data points since they do not “line-up” nicely. From your line, estimate your slope and your intercept.

-10 0 10 20 30 40 50 60 70 80 90 100

0

20

40

60

80

100

120

140

160

systolic BP

diastolic BP

Women's BP Data

As a different type of example, consider the following data from a designed experiment.

“Graphitized carbon was impregnated with potassium carbonate, then heated in a stream of 15% carbon dioxide in nitrogen. In part of the experiment, the total amount of carbon monoxide desorbed was found. The data give the amount 𝑦 of CO (carbon monoxide) desorbed” Ratio The initial potassium:carbon ratio (in percent) Desorbed The amount of carbon monoxide desorbed (in percent (moles per mole of C))

The data are found at the website(Datasets for Statistical Analysis: Regression) and are from a textbook (Atkinson & Donev, 1992). The first row is labeled “”Ratio” and the second row is labeled “Desorbed”.

0.05 0.05 0.25 0.25 0.5 0.5 0.5 1.25 1.25 1.25 1.25 1.25 2.1 2.1 2.1 2.1 2.1 2.1 2.5 2.5 2.5 2. 0.05 0.1 0.25 0.35 0.75 0.85 0.95 1.42 1.75 1.82 1.95 2.45 3.05 3.19 3.25 3.43 3.5 3.93 3.75 3.93 3.99 4.

These data are plotted below. What is the first qualitative difference you observe? You should see that the 𝑥 values (the ratios) are all fixed whereas our previous examples let them be random. We “took them as we found them”. In this case we are setting the 𝑥 values and seeing what happens to the 𝑦 values. Our goal is to create a line by “hooking up” all the average 𝑦 values associated with a given 𝑥 value.

-0.5 0 0.5 1 1.5 2 2.

0

1

2

3

4

ratio

desorbed

y = 1.6*x - 0.

Notice the hat!!! We use hats for predicted values. The ordered pairs 𝑥, 𝑦 will be on the fitted, regression line while the ordered pairs 𝑥, 𝑦 are the values you have measured.

Since, in general, your data values will not lie on the regression line you should think about the errors, or vertical distances from the data point you measured to the corresponding point on the line. We write our errors as 𝑒𝑟𝑟𝑜𝑟 = 𝑦 − 𝑦 Our text calls this the “unexplained deviation”.

A common way to find the best slope and intercept for a given set of data is to make our errors as small as possible. To avoid cancellation we make all the terms positive (just as we did for the standard deviation) and make the sum of the squared errors as small as possible.

𝑀𝑖𝑛𝑖𝑚𝑖𝑧𝑒 𝑒𝑟𝑟𝑜𝑟𝑠^2 = 𝑦 − 𝑦 2

You usually learn the mathematics to find the best slope and intercept in Calculus III. For now I’ll just state that you wind up with

𝑏𝑒𝑠𝑡 𝑠𝑙𝑜𝑝𝑒 = 𝑏 = 𝑥𝑥𝑖^ −^ 𝑥^ 𝑦𝑖^ −^ 𝑦 𝑖 −^ 𝑥^ 𝑥𝑖 −^ 𝑥

Many people call the term in the numerator

𝑆𝑆𝑥𝑦 = 𝑥𝑖 − 𝑥 𝑦𝑖 − 𝑦

We have already seen the term in the denominator (when we calculated the sample standard deviation) and we call it

𝑆𝑆𝑥𝑥 = 𝑥𝑖 − 𝑥 𝑥𝑖 − 𝑥

There are shortcut formulas here. We call the formulas above the defining formulas and call the ones on the following lines the computational formulas.

And

𝑆𝑆𝑥𝑥 = 𝑥𝑖^2 − (^1) 𝑛 𝑥𝑖 𝑥𝑖

Excel can help us out here. We need to add the 𝑥 terms, we need to add the 𝑦 terms, we need to add the 𝑥^2 terms, and we need to add the 𝑥 𝑦 terms.

x, systolic y, diastolic xx xy 138 70 19044 9660 145 81 21025 11745 121 83 14641 10043 134 85 17956 11390 144 78 20736 11232 109 72 11881 7848 146 84 21316 12264 108 62 11664 6696 120 68 14400 8160 123 75 15129 9225 sums 1288 758 167792 98263

This gives us

𝑥𝑖 = 1288 𝑦𝑖 = 758 𝑥𝑖 𝑦𝑖 = 98263 𝑥𝑖 𝑥𝑖 = 167792

And your first regression presentation topic, should you choose to take it, is to find the best slope and intercept in a model which seeks to predict Men’s Body Temperature from Men’s Systolic Blood Pressure. That is 𝑡𝑒𝑚𝑝 = 𝑠𝑙𝑜𝑝𝑒 ∗ 𝑠𝑦𝑠𝑡𝑜𝑙𝑖𝑐 + 𝑖𝑛𝑡𝑒𝑟𝑐𝑒𝑝𝑡. Please show us your spread sheet results.

Bibliography Atkinson, A. C., & Donev, A. (1992). Optimum Experimental Designs. Oxford: Clarendon Press.

Datasets for Statistical Analysis: Regression. (n.d.). Retrieved from http://www.sci.usq.edu.au/staff/dunn/Datasets/tech- regression.html.

Houck, M. A., Gauthier, J. A., & Strauss, R. E. (1990 ). Allometric Scaling in the Earliest Fossil Bird, Archaeopteryx lithographica. Science, , pp. 195-198 , 247 (4939), 195-198.

Moore, D. S., & McCabe, G. P. (1993). Introduction to the Practice of Statistics (2 ed.). New York, NY: W.H. Freeman and Company.

-10 0 50 100 150

0

10

20

30

40

50

60

70

80

Women's Blood Pressure Data

y = 0.3334*x + 32.