SAS Practice Lab 2 - Fitting Multiple Linear Regression Model | STAT 231B, Lab Reports of Statistics

Material Type: Lab; Professor: Cui; Class: STATISTCS FOR BIOLOGICL SCIENCES; Subject: Statistics; University: University of California-Riverside; Term: Spring 2006;

Typology: Lab Reports

Pre 2010

Uploaded on 03/28/2010

koofers-user-lm6
koofers-user-lm6 🇺🇸

5

(1)

10 documents

1 / 13

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Statistics 231B SAS Practice Lab #2
Spring 2006
This lab is designed to give the students practice in fitting multiple linear regression
model and testing regression relation, obtaining scatter plot matrix, correlation matrix and
box plot for diagnostic purpose, calculate the coefficient of multiple determination R2 and
coefficient of simple determination.
Example: In a small-scale experimental study of the relation between degree of
brand liking (Y) and moisture content (X1) and sweetness (X2) of the product, the
data were obtained from the experiment based on a completely randomized
design, see CH06PR05.txt.
To study this relationship, we can first set up
A Multiple Regression Model: First Order Model with Two Predictor Variables
Y X X
i i i i
0 1 1 2 2
with Response function as
22110
XXYE
Meaning of Regression Coefficients
tscoefficien regression partial called are ,
21
In this example, when X1 changes, the change in Y is the same no matter what level X2
is held at, and vice versa. Such a model is called an additive effects model and the
predictors do not interact in the effects on Y.
Is it reasonable to assume this first order regression model? We can use
Scatter Plot Matrix and the Correlation Matrix to get some feeling about the nature and
strength of the bivariate relationship between each of the predictor variables and the
response variable and in identifying gaps in the data points as well as outlying data
points. A correlation matrix contains the coefficient of simple correlation between Y and
each of the predictor variables, as well as all of the coefficients of simple correlation
among the predictor variables. It is a complement to the scatter plot matrix.
(1) Obtain the scatter plot matrix and the correlation matrix.
SAS CODE:
constant held isX and 1X when }Y{E
constant held is X and 1X when }Y{E
mode of rangein 0X,0X ifonly
YE responsemean intercept, theis
122
211
21
0
pf3
pf4
pf5
pf8
pf9
pfa
pfd

Partial preview of the text

Download SAS Practice Lab 2 - Fitting Multiple Linear Regression Model | STAT 231B and more Lab Reports Statistics in PDF only on Docsity!

Statistics 231B SAS Practice Lab

Spring 2006

This lab is designed to give the students practice in fitting multiple linear regression

model and testing regression relation, obtaining scatter plot matrix, correlation matrix and

box plot for diagnostic purpose, calculate the coefficient of multiple determination R^2 and

coefficient of simple determination.

Example: In a small-scale experimental study of the relation between degree of

brand liking (Y) and moisture content (X 1 ) and sweetness (X 2 ) of the product, the

data were obtained from the experiment based on a completely randomized

design, see CH06PR05.txt.

To study this relationship, we can first set up

A Multiple Regression Model: First Order Model with Two Predictor Variables

Yi   0   1 X i 1   2 Xi 2  i

with Response function as

E  Y  0  1 X 1  2 X 2

Meaning of Regression Coefficients

 1 ,  2 arecalledpartialregressioncoefficien ts

 In this example, when X 1 changes, the change in Y is the same no matter what level X 2

is held at, and vice versa. Such a model is called an additive effects model and the

predictors do not interact in the effects on Y.

Is it reasonable to assume this first order regression model? We can use

Scatter Plot Matrix and the Correlation Matrix to get some feeling about the nature and

strength of the bivariate relationship between each of the predictor variables and the

response variable and in identifying gaps in the data points as well as outlying data

points. A correlation matrix contains the coefficient of simple correlation between Y and

each of the predictor variables, as well as all of the coefficients of simple correlation

among the predictor variables. It is a complement to the scatter plot matrix.

(1) Obtain the scatter plot matrix and the correlation matrix.

SAS CODE:

E{Y}when X 1 andX isheld constant E{Y}when X 1 andX isheld constant onlyifX 0 ,X 0 inrangeof mode istheintercept,meanresponseE Y 2 2 1 1 1 2 1 2 0           

run; varprocy x1corr x2;data Brandpreferenceoutp^ out1; /^ c run; scatterprocinsighty x1x2datay x1Brandprefe x2;^ rence; /*draw^ scatt input y x1 x2; infile'Z:\ch06pr05.txt' ; data Brandprefe rence; ^     ^    SAS OUTPUT: Scatter plot matrix: y 61 100 x 1 4 10 x 2 2 4 The correlation matrix: The CORR Procedure 3 Variables: y x1 x Simple Statistics

Coefficient of Multiple Determination 1 p 1 TOT TOT 2

associatedwith theuseofthevariablesX,... X

theproportionatereductionoftotalvariationin Y

SS

SSE

SS

SSR

R

The closer the R^2 is to 1, the greater is said to be the degree of linear association between Y^ andX^1 ,...Xp^1 Coefficient of Simple Determination between Y and (^) Yˆ^ is the square of correlation coefficient between Y and (^) Yˆ The closer the coefficient of simple determination is to 1, the greater is said to be the degree of linear association between YandYˆ SAS CODE: quit;^ run; procvar y corr yhat;data results ; /* calcul run;output out results r residual^ p procmodelregy datax1x2Brandprefe/r p; rence; inputinfile'y x1Z: \ch06pr05. x2; txt'^ ; data Brandprefe rence;        SAS OUTPUT : The REG Procedure Model: MODEL Dependent Variable: y Number of Observations Read 16 Number of Observations Used 16 Analysis of Variance Sum of Mean Source DF Squares Square F Value Pr > F Model 2 1872.70000 936.35000 129.08 <. Error 13 94.30000 7. Corrected Total 15 1967. Root MSE 2.69330 R-Square 0. Dependent Mean 81.75000 Adj R-Sq 0. Coeff Var 3.

Parameter Estimates Parameter Standard Variable DF Estimate Error t Value Pr > |t| Intercept 1 37.65000 2.99610 12.57 <. x1 1 4.42500 0.30112 14.70 <. x2 1 4.37500 0.67332 6.50 <. The CORR Procedure 2 Variables: Y yhat Simple Statistics Variable N Mean Std Dev Sum Minimum Maximum Y 16 81.75000 11.45135 1308 61.00000 100. yhat 16 81.75000 11.17348 1308 64.10000 99. Simple Statistics Variable Label Y yhat Predicted Value of Y Pearson Correlation Coefficients, N = 16 Prob > |r| under H0: Rho= Y yhat Y 1.00000 0. <. yhat 0.97574 1. Predicted Value of Y <. (a) Please write down the estimated regression function. How is b 1 interpreted here? (b) Test whether there is a regression relation, using=0.01. Write down the null hypothesis, alternative hypothesis, decision rule. What does your test imply about1 and2? (c) What is the P-value of the test in part (b)? (d) How is the coefficient of multiple determination R^2 interpreted here? Does it equal the coefficient of simple determination between Y and (^) Yˆ^?

‚ 6 ˆ ‚ ‚ ‚ ‚ ‚ ‚ A 4 ˆ ‚ ‚ A ‚ A ‚ ‚ A ‚ 2 ˆ ‚ ‚ A R ‚ A e ‚ s ‚ i ‚ A d 0 ˆ A u ‚ a ‚ l ‚ A ‚ ‚ A ‚ A -2 ˆ A ‚ ‚ A ‚ ‚ A ‚ ‚ -4 ˆ ‚ A ‚ ‚ ‚ ‚ ‚ -6 ˆ ‚ Šˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒ 60 65 70 75 80 85 90 95 100 Predicted Value of y

Plot of residual*x1. Legend: A = 1 obs, B = 2 obs, etc. ‚ 6 ˆ ‚ ‚ ‚ ‚ ‚ ‚ A 4 ˆ ‚ ‚ A ‚ A ‚ ‚ A ‚ 2 ˆ ‚ ‚ A R ‚ A e ‚ s ‚ A i ‚ A d 0 ˆ A u ‚ a ‚ l ‚ A ‚ ‚ A ‚ A -2 ˆ A ‚ ‚ A ‚ ‚ A ‚ ‚ -4 ˆ ‚ A ‚ ‚ ‚ ‚ ‚ -6 ˆ ‚ Šƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒƒˆƒƒ 4 6 8 10 x

Basic scatter plot Plot of residual*x1x2. Legend: A = 1 obs, B = 2 obs, etc. ‚ 6 ˆ ‚ ‚ ‚ ‚ ‚ ‚ A 4 ˆ ‚ ‚ A ‚ A ‚ ‚ A ‚ 2 ˆ ‚ ‚ A R ‚ A e ‚ s ‚ A i ‚ A d 0 ˆA u ‚ a ‚ l ‚ A ‚ ‚ A ‚ A -2 ˆ A ‚ ‚ A ‚ ‚A ‚ ‚ -4 ˆ ‚ A ‚ ‚ ‚ ‚ ‚ -6 ˆ ‚ Šˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒƒƒƒƒƒƒƒƒƒˆƒ 8 12 16 20 24 28 32 36 40 x1x

The UNIVARIATE Procedure Variable: residual (Residual) Moments N 16 Sum Weights 16 Mean 0 Sum Observations 0 Std Deviation 2.50732261 Variance 6. Skewness 0.05459543 Kurtosis -0. Uncorrected SS 94.3 Corrected SS 94. Coeff Variation. Std Error Mean 0. Basic Statistical Measures Location Variability Mean 0.000000 Std Deviation 2. Median 0.025000 Variance 6. Mode. Range 8. Interquartile Range 3. Tests for Location: Mu0= Test -Statistic- -----p Value------ Student's t t 0 Pr > |t| 1. Sign M 0 Pr >= |M| 1. Signed Rank S 0 Pr >= |S| 1. Tests for Normality Test --Statistic--- -----p Value------ Shapiro-Wilk W 0.975851 Pr < W 0. Kolmogorov-Smirnov D 0.106775 Pr > D >0. Cramer-von Mises W-Sq 0.022652 Pr > W-Sq >0. Anderson-Darling A-Sq 0.161747 Pr > A-Sq >0. Quantiles (Definition 5) Quantile Estimate 100% Max 4. 99% 4. 95% 4. 90% 3. 75% Q3 1. 50% Median 0. 25% Q1 -1.