Regression Analysis: g and r Relationship, Response Variable Transformation in Hawaii, Assignments of Statistics

The output of a regression analysis for the relationship between two variables g and r. The analysis includes the plot of y versus x, estimation of the regression model, computation of residuals, and assessment of assumptions such as normality and constant error variance. The document also suggests a square root transformation for the response variable y to correct the violation of constant error variance.

Typology: Assignments

Pre 2010

Uploaded on 10/12/2009

koofers-user-20c
koofers-user-20c 🇺🇸

4.5

(2)

6 documents

1 / 14

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
.t
STAT 423016230 Homework ff 6
April 11, 2008
ll
to
Fq
0.3599
AdJB4
0,ls.lg
mrsE
1 . t5.?
1. Problem 8.2 (1-0 Points)
We first plot the values of g versus r to see the relationship between the two va.riables:
Plot of y versus x, Problem 8.2
y - -3.1792 .2.431x
70
o.o 2.5 S.0 t.5 lo.o lz.s l5,o t?.5 2o.o 22,s 25.0
We see that a linear relationship appears to be appropriate, so let us try fitting a regression model to
the data.
a. FYom the output below, we see that the estimated model is
0 : -3.17919 * 2-49O98 r-
The REG Procedure
Model: MODEL1
Dependent Variable: y
Aaalysis of Variance
sr:n 6t Meaa
Source DF Squares Square F Value Pr > F
Model 1 3303.54328 3303.54328 191.43 <.0001
Error 8 138.05672 t7.257O9
Corrected Total I 3447.60000
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Partial preview of the text

Download Regression Analysis: g and r Relationship, Response Variable Transformation in Hawaii and more Assignments Statistics in PDF only on Docsity!

.t

STAT 423016230 Homework^ ff 6

April 11, 2008

ll to Fq 0. AdJB4 0,ls.lg mrsE 1. t5.?

1. Problem 8.2 (1-0 Points)

We first plot the values of g versus r to see the relationship between the two^ va.riables:

Plot of y^ versus x, Problem 8.

y - -3.1792 .2.431x 70

o.o 2.5 S.0^ t.5^ lo.o^ lz.s^ l5,o^ t?.5^ 2o.o^ 22,s^ 25.

We see that a linear relationship appears to be^ appropriate,^ so^ let^ us^ try^ fitting^ a^ regression^ model^ to

the data.

a. FYom the output below, we see that the estimated model^ is

0 :^ -3.17919 *^ 2-49O98^ r-

The REG Procedure

Model: MODEL

Dependent Variable: y

Aaalysis of Variance

sr:n 6t Meaa

Source DF Squares^ Square^ F^ Value^ Pr^ >^ F

Model 1 3303.54328 3303.54328 191.43^ <.

Error 8 138.05672^ t7.257O

Corrected Total I 3447.

Root MSE Dependent Mea-n

Coeff Var

4. t54r

Parameter Estinates

R-Square 0. Adj R-Sq 0.

Variable

IDtercept

x

Parameter Estinat'e " -3.

Predicted

Value Residual 1.8028 3.L 6.7847 3.

L4.2577 -2.

2t-7307 0. 26.7126 -r.7L 34.1856 -7.18s 41.6s85 -2. 46.640s 3. 49.1315 -2. 1315 59.0954 5.

Standard ' Errtri ' 2.7469A

DF 1 1

t Vdiue

Pr' > (^) ltl

  1. 2805 <.

b. We now compute the residuals for the data. The residuals^ are^ computed^ as^ the^ difference^ between

the actual value^ of^ y^ and^ the^ predicted value^ of^ y,^ known^ as^ f. For^ instance,^ we^ see^ that^ for 'observation (^) I, € : (^) y

  • 0 :^ 5.0000^ -^ 1.8028^ :^ 3'1972- Dependent Obs Variable 1 5. 2 10. 3 12. 4 22. 5 25, 6 27. 7 39, 8 50. 9 47. 10 65.

c. We now plot^ the residuals versus^ tr:

y - -t.t?92 +2'{31 x 6

Resrduals versus x, Problem 8,

x lo Bsc

rrdJR€ 0. Rl'lsE 4, l s{?

t2.

There appears to be a curvature trend^ to^ the residual^ plot.^ This^ implies^ that^ a quadratic term needs to be^ added.

#---*

Plot of Residuals versus Pressure (x)

t.

fo ?o 30 40 50 60 70 DTEtrC

c. An^ alternative model^ is

E(d:9o9p9212.

d. The regression output for^ this^ model^ appears^ belov".

The REG Procedure

Model: MODEL

Dependent Variable: volume Number of Observations Read^11 Nunber of Observations Used 11

Analysis of Variance Sun of Mean Source DF^ Squares^ Square^ F^ Value^ Pr^ >^ F Model 2 37O.2477t^ 185.12386^ 1694.30^ <. Error I 0.8741'0^ 0. Corrected Total^10 377. Root MSE 0.33055 R-Square^ 0. Dependent Mean 88.L7273^ Adj^ R-Sq^ 0.

  • (^) Coeff Var 0.

Parameter Estinates

Parameter Standard Variable DF Estinate Error^ t Value^ Pr^ >^ ltl fntercept 1 99:50283^ 0.26829^ 370'88^ <. pressrue t^ -0 (^) '34727 0.01831^ -18.97^ <- pressure2 1 0.00131 0.00025400^ 5.^16 0.

The estimated quadratic model is

g :99.50283 (^) - 0.34727 c (^) * 0'00131 12,

volEe.98.6l5 -0. N tl Rsq 0.9s 6dJR:q 0.9s FI{SE {r.G484.

where g^ is the volume and^ c^ is the^ pressure.^ To determine^ if^ the^ qua.dratic^ model is^ useful,^ we test the hypotheses

Ho :'Fr^ :^ 0z : I/" :^ At least one B, + 0.

The test statistic is^ F:1694,30,^ which is very^ larp{e.and corresponds^ to^ a'pvalue of^ less^ than 0.0001. This pvalue^ falls^ below^ o^ :^ 0.05,^ which^ means^ we^ reject the^ null^ hypothesis. Thus, the quadratic model is a useful predictor of compressed^ volume.

3. Problem 8.5 (1-0^ Points)

a. The residuals for the model are given^ below:

36 7200 549.

Dependent Predicted

Obs Variable Val-ue 1 3137 3526 2 3590 3768 3 4526 4030 4 LO825 11886 5 4023 3693 6 7606 6537. 7 3748 3295 I 2972 3295 9 3163 3295 10 4065 47s 11' 2048 2643 12 6500 6402 13 5651 6402 L4 6565 6402 1s 6387 6402 t6 6454 6733 t7 6928 6537 18 4268 4030 19 7479L^12022 20 2680 4293 2t 2974 1433 22 1965 t 23 2566 1936 24 151s^1455 25 2000 1936 26 2735 1599 27 3698 4443 2A 2635 2644 29 1206 549. 30 3775 4309 31 3t20^4309 32 4206 4309 33 4006 4309 34 3728 4640 35 32Lt 4443 36 1200 549.397r

Residual -389.

-777.7r

4b6.2s

1069

:322.

-13I.

-75r.

2769

  • 1541
  1. 5881

1 136 -74s. 1793 -9.

-533. 9910

-102. -302. -9L2.

b. We now plot^ the residuals versus^ c1^ , which^ appeaxs^ in^ the graph below.^ We do^ not^ observe any

major trends in this plot.

Intercept

x x x x

(^1) -3783.43295 1205. 1 0.00875 0. 1 1.92648 0. r 3444.25464 91L.72a 1 2093.35356 305.

9.68 "

Below is^ the.plot for^ the^ partial^ rdsiduals.of^ rl^ against^ 11.^ As^ stated in.the^ book, when^ we^ regress the partial-residuals^ f;;t^ ;n^ c1,^ the eitlimated t6efficient^ 6i^ zr^ is^ th6^ same^ ds^ in^ the^ full'model'^ " we fit. We^ do^ not^ see^ any^ major^ deviations^ from^ this^ line.^ The^ partial^ residuals measure the influence of 11 on g^ after removing^ (or^ accounting^ for)"the^ effects^ of^ r2,frs,^ and^ ra.^ We do^ not

have an indication of lack of fit for the 11 variable, as seen in^ the plot-

Partial Residuals Plot for x Ml - 2:f-14^ +0.0007x| t40oo

:..2'i.";'

3o0ooo to000o0^1100000

e. The plot of the partieil^ residuils for 12 versus rr2^ appears^ below.^ Here, we^ are^ removing (accounting

for) the efiects of r1,rs, and 14 on g.^ As before, we^ see^ that^ regressing^ the^ partial^ residuals^ for

frz ot fr2 produces the same coefficient for z2 as the full model^ fit^ above gives^ (02:^ I'926a$.

Once again, we observe for^ a^ generally^ linear pattern,^ with^ the points^ scattered closely^ about^ the

line. The point on the uppermost right corner may warrant closer^ inspection,^ but^ overall,^ we do not see a reason^ to^ suspect^ lack^ of^ fit^ for^ 12.

12000

| 0000

8000

t 36 Baq o. AdJRsq 0. nmsla54.

  • 6000

Partial Residuals Plot for x DnZ -^ l3E-13.1.9255x 7000

aa a (^) D (^) a (^) ,-.-"

. a,---'""

-a." -.--'r-

.. a

a

-3.1792 +2.,191^ x

x 36 Bsq 0,6t 6dJ Rsq

Bl'rsE 05,1.

  • 3000

0

-looo t 250 tz

4. Problem 8.9 (L0 Points)

The plot^ of the residuals^ versus^ the predicted^ values^ f; we^ do^ not^ see^ the problematic trend of^ fanning

out. In this sense, the constant error^ variance assumption appears^ to^ be^ satisfied.^ In^ other^ words,^ the

variance does^ not start out^ as^ very small^ and^ then^ get larger^ with^ increasing^ f^.^ However, we do^ sed^ a

quadratic trend,'so wb may wish to incorporate this curvature^ in^ the^ model.

Plot of Residuals versL,ls^ Predicted^ Values

G

4

2

0

30 kcdlcEd V5l

a

xt nq 0.s tidJRq o,9s R'ISE4. l9+

b. The plot^ of the^ residuals versus^ predicted^ values^ for the^ Hawaii data^ set appears^ below.^ We^ can

see that the residuals become more spread out^ as (^) f increases.^ This^ is^ essentially^ a^ fanning out

trend, which means that our^ assumption^ of constant error^ variance^ may^ be^ violated.

Resduals versJs^ Predtcted^ Values. Problem^ 8. 'y - 60 -44.095^ +il.ss'lx^ -0.0638x2^ -

s0 too kdlcGd UalE

c. Skipped

d. Based on the results from^ part^ b,^ we^ would^ transform^ the^ y^ variable and^ refit^ the model^ to^ correct

the variance assumption problem.

8. Problem^ 8.L6^ (10^ Points)

We can check the^ normality^ assumption^ with^ the normal^ probability plot^ for^ regression^ residuals' This plot appears below, and the points^ generally appear^ in^ a^ linear trend^ without^ drastic deviation.^ Thus,

we would^ conclude that^ the^ normality^ assumption is satisfied.

x 20 Rq o. AdJRE 0. nl|sE 25,tll

10

,t: (^) ir

Normal Probabrltty Plot of tre^ Resduals y - -4283.4 *0.00e?tl .1. 30oo 1

a

ll'

ll

0 nrml qfftlto

Another method of^ assessing^ normality^ is examining a histogram of^ the^ residuals. We^ see^ a roughly

mound-shaped histogram, which^ indicates^ that^ the^ normality^ assumption^ is^ not^ violated.

Hlstogram of.Reslduals

Raw Rosrduals

  1. Problem 8.17 (L0 Points) : FYom the normal probability^ plot^ for^ regression^ residuals,^ we^ see^ a^ generally^ linear trend^ with^ no^ major deviations. Thus, it appears that^ the^ normality^ assumption^ for the^ errors^ is^ satisfied.

x 36 Bq 0. AdJRsq 0. Fl'lSlE 8!9.

q u

11

a. Using the formula (^) U (^) - 0, wecan^ compute^ the^ residuals:

The REG Procedure

Model: MODEL

Dependent Variable: y Output Statistics

Dependent Predicted

Obs vaiiaule ' '^ Value ^' r 723{ 1396 2 1o8o 1158' 3 845.0000 880. 4 7522 1346 5 7047 LL 6 1979 7925 7 L822 1680 a 1253 1202 9 1^297 1180 10 946.0000 874. 11 l7t3 1696 12 LO24 1097 13 1147 1094 14 1092 tt 15 L752 7269 16 1336 1126 t7 2L3L 2030 18 1550 1667 i9 1884 1677 20 2041 1865 2L 845.0000 998. 22 1483 1460 23 1055 1240 24 1545 1578 (^25) 729.0000 552. 26 1792 1775 27 rt75 1365 28 1593 r73L 29 785.0000 676. 30 744.OOOO 727. 31 1356 1562 32 7262 1403

Residuari -16L. -77. -35.

fi6.

-rt7.

t7 .t

2L0.

L00.7L -1L7.

  1. 1559 -153.

-L85. -33. t76.25L 76.8L -189. -138. 1599

76.rL -206. -141. 1360

b. Storing the residuals as a data set, we can obtain the summary statistics^ of the^ residuals.^ For

comparison, we determine the MSE to be 17818. Analysis of (^) Variaace

Source Model

Error

Corrected Total

DF 2 29 31

Sum of

Squares 4283063 516727 4799790

L33.48467 R-Square 0. 1326.87500 Adj R-Sq 0.

Mean Square F^ Value^ Pr^ >^ F 2t4t53l 120.79 <. 17818

Root MSE Dependent Mean.

Coeff Var

We determine the mean of^ the residuals^ to^ be^ rather^ close^ to^ 0 and^ the^ variance^ of^ the^ residuals

to be 129.10692492 (^) = 16668.5981. The MSE is 17818, which^ is somewhat^ close^ to^ the^ variance^ of

13

the residuals. The^ prices are^ in^ the^ approxirnate^ range^ of^ 7AO^ b^ 2200. We^ have^ a^ difference^ of 71,49.4019 between the MSE^ and variance of^ the^ residuals.

The I"IEANS Procedure Analysis Variable : raw Residual N Mean Std Dev Mininum^ Maximun

'32 (^) -4.707358-14 729.1069249 -206.4849562 273.

c. Flom the previous^ parts,^ we see^ that^ s^ = JMSE (^) = 133.48467.^ Thus,^ 2s:266.9682,^ so^ we a,re interested in^ the proportion^ of residuals outside^ the^ range^ of^ (-266.9682,266.9682).^ However,

the minimum residual (-206.485) and^ the^ maximum residual^ (213.496)^ still^ fall^ inside^ this^ range.

Thus, none of the 32 residuals fall^ outside^ 2s^ from^ 0.^ Likewise, none^ of the^ residuals^ would fall

outside 3s^ of^ 0.

d. Skipped

I

4a ,--