Regression Analysis: Variable Subset Selection Techniques using SAS - Prof. Mervyn G. Mara, Study notes of Statistics

This document demonstrates the use of sas software to perform regression analysis with variable subset selection techniques. The techniques used include forward selection, backward elimination, and stepwise selection. The data set consists of five variables and 13 observations, and the dependent variable is denoted as 'y'. The goal is to identify the best subset of independent variables that can explain the variation in the dependent variable.

Typology: Study notes

Pre 2010

Uploaded on 09/02/2009

koofers-user-a9q
koofers-user-a9q 🇺🇸

10 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EXAMPLE D9
SAS Program
data cement;
input x1-x4 y;
datalines;
7 26 6 60 78.5
1 29 15 52 74.3
11 56 8 20 104.3
11 31 8 47 87.6
7 52 6 33 95.9
11 55 9 22 109.2
3 71 17 6 102.7
1 31 22 44 72.5
2 54 18 22 93.1
21 47 4 26 115.9
1 40 23 34 83.8
11 66 9 12 113.3
10 68 8 12 109.4
; run;
*title c=darkpurple h=2 'Regression : Variable Subset Selection Techniques';
*symbol v=dot h=1 c=red;
proc reg corr ;
model y = x1-x4/selection=f sle=.05;
model y = x1-x4/selection=b sls=.1;
model y = x1-x4/selection=stepwise sle=.15 sls=.15;
model y = x1-x4/selection=rsquare sse cp;
model y = x1-x4/selection=rsquare start=1 stop=3 best=2
sse mse aic cp;
*plot cp.*np./cmallows=blue vaxis=0 to 8 by 1
ctext=blue caxes=darkred;
title 'Regression : Variable Subset Selection Techniques';
run;
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Regression Analysis: Variable Subset Selection Techniques using SAS - Prof. Mervyn G. Mara and more Study notes Statistics in PDF only on Docsity!

EXAMPLE D9SAS Program data^ cement;input^ x1-x4 y;datalines;7 26^ 6 60^ 78.51 29 15 52^ 74.311 56^ 8 20 104.311 31^ 8 47^ 87.67 52^ 6 33^ 95.911 55^ 9 22 109.23 71 17^ 6 102.71 31 22 44^ 72.52 54 18 22^ 93.121 47^ 4 26 115.91 40 23 34^ 83.811 66^ 9 12 113.310 68^ 8 12 109.4;^ run ; title^ c=darkpurple h=^2 'Regression : Variable Subset Selection Techniques';symbol^ v=dot^ h=^1 c=red; proc^ reg^ corr^ ; model^ y = x1-x4/selection=f

sle= .05 ; model y = x1-x4/selection=b sls= .1 ; model y = x1-x4/selection=stepwise^ sle= .15^ sls= .15 ; model y = x1-x4/selection=rsquare^ sse^ cp;model y = x1-x4/selection=rsquare^ start=^1 stop=^3 best=^2^ sse^ mse^ aic^ cp;plot cp.np./cmallows=blue vaxis=^0 to^^8 by^^1^ ctext=blue^ caxes=darkred;title 'Regression : Variable Subset Selection Techniques';

run ;

SAS Log 2 data^ cement; 3 input^ x1-x4^ y; 4 datalines;NOTE:^ The^ data^ set^ WORK.CEMENT

has^13 observations^ and^5 variables.

NOTE:^ DATA^ statement^ used (Total process

time):

real^ time^ 0.01 secondscpu^ time^ 0.00 seconds 18 ; run; 1920 proc^ reg^ corr^ ; 21 model^ y^ =^ x1-x4/selection=f

sle=.05;

22 model^ y^ =^ x1-x4/selection=b

sls=.1;

23 model^ y^ =^ x1-x4/selection=stepwise sle=.15 sls=.15; 24 model^ y^ =^ x1-x4/selection=rsquare

sse cp;

25 model^ y^ =^ x1-x4/selection=rsquare

start=1^ stop=3^ best=

sse^ mse^ aic^ cp;

27 title^ 'Regression^ :^ Variable

Subset^ Selection Techniques';

28 run;

The^ REG^ ProcedureModel:^ MODEL1Dependent^ Variable:^ y Number of Observations Read^13 Number of Observations Used^13 Forward^ Selection: Step^1 Variable x4 Entered: R-Square^ = 0.6745^ and^ C(p)^ = 138.7308Analysis^ of^ VarianceSum^ of^ Mean Source^ DF^

Squares^ Square^ F Value

Pr^ >^ F Model^1

1831.89616^ 1831.^

22.80^ 0.

Error^11

883.86692^ 80.

Corrected^ Total^12

2715.76308Parameter StandardVariable Estimate Error^ Type^ II^ SS^ F^ Value

Pr^ >^ F Intercept^ 117.56793^ 5.

40108 499.16^ <.

x4^ -0.73816^ 0.

1831.89616^ 22.80^ 0.0006Bounds on condition^ number: 1, 1

Forward^ Selection: Step^2

Variable^ x1^ Entered:^ R-Square^

=^ 0.9725^ and C(p)^ =^ 5.4959Analysis of VarianceSum of^ Mean Source^ DF^

Squares^ Square^ F Value

Pr^ >^ F Model^2

2641.00096^ 1320.^

176.63^ <.

Error^10

74.76211^ 7.

Corrected^ Total^12

2715.76308Parameter StandardVariable Estimate Error^ Type^ II^ SS^ F^ Value

Pr^ >^ F Intercept^ 103.09738^ 2.

17615 2356.10^ <.

x1^ 1.43996^ 0.

809.10480^ 108.22^ <.

x4^ -0.61395^ 0.

1190.92464^ 159.30^ <.0001Bounds on condition number: 1.0641, 4. ----------------------------------------------------------------------------------------------------No^ other^ variable^ met

the^ 0.0500 significance level

for^ entry^ into^ the^ model. Summary of Forward Selection Variable^ Number^ Partial

Model Step^ Entered^ Vars^ In^

R-Square^ R-Square^ C(p)^

F^ Value^ Pr >^ F 1 x4^1

0.6745^ 0.6745^ 138.

22.80^ 0.

2 x1^2

0.2979^ 0.9725^ 5.

108.22^ <.

Backward^ Elimination:^ Step^1 Variable x3 Removed:^ R-Square^ =^ 0.9823^ and C(p)

=^ 3.0182Analysis of VarianceSum of Mean Source^ DF^

Squares^ Square^ F Value

Pr^ >^ F Model^3

2667.79035^ 889.^

166.83^ <.

Error^9

47.97273^ 5.

Corrected^ Total^12

2715.76308Parameter StandardVariable Estimate Error^ Type^ II^ SS^ F^ Value

Pr^ >^ F Intercept^ 71.64831^ 14.

136.81003^ 25.67^ 0.

x1^ 1.45194^ 0.

820.90740^ 154.01^ <.

x2^ 0.41611^ 0.

26.78938^ 5.03^ 0.

x4^ -0.23654^ 0.

9.93175^ 1.86^ 0.2054Bounds on condition number: 18.94,^ 116.

Backward Elimination: Step 2Variable x4 Removed:^ R-Square^ =^ 0.9787^ and C(p)

=^ 2.6782Analysis of VarianceSum of Mean Source^ DF^

Squares^ Square^ F Value

Pr^ >^ F Model^2

2657.85859^ 1328.^

229.50^ <.

Error^10

57.90448^ 5.

Corrected^ Total^12

2715.76308Parameter StandardVariable Estimate Error^ Type^ II^ SS^ F^ Value

Pr^ >^ F Intercept^ 52.57735^ 2.

3062.60416^ 528.91^ <.

x1^ 1.46831^ 0.

848.43186^ 146.52^ <.

x2^ 0.66225^ 0.

1207.78227^ 208.58^ <.0001Bounds on condition number: 1.0551, 4. ----------------------------------------------------------------------------------------------------All^ variables^

left in the^ model are^ significant

at^ the 0.1000^ level. Summary of Backward Elimination Variable^ Number^ Partial

Model Step^ Removed^ Vars^ In^

R-Square^ R-Square^ C(p)^

F^ Value^ Pr >^ F 1 x3^3

0.0000^ 0.9823^ 3.

0.02^ 0.

2 x4^2

0.0037^ 0.9787^ 2.

1.86^ 0.

Stepwise^ Selection: Step 2Variable x1 Entered:^ R-Square^ =^ 0.9725^ and C(p)

=^ 5.4959Analysis of VarianceSum of Mean Source^ DF^

Squares^ Square^ F Value

Pr^ >^ F Model^2

2641.00096^ 1320.^

176.63^ <.

Error^10

74.76211^ 7.

Corrected^ Total^12

2715.76308Parameter StandardVariable Estimate Error^ Type^ II^ SS^ F^ Value

Pr^ >^ F Intercept^ 103.09738^ 2.

17615 2356.10^ <.

x1^ 1.43996^ 0.

809.10480^ 108.22^ <.

x4^ -0.61395^ 0.

1190.92464^ 159.30^ <.0001Bounds on condition number: 1.0641, 4.

Stepwise^ Selection: Step 3Variable x2 Entered:^ R-Square^ =^ 0.9823^ and C(p)

=^ 3.0182Analysis of VarianceSum of Mean Source^ DF^

Squares^ Square^ F Value

Pr^ >^ F Model^3

2667.79035^ 889.^

166.83^ <.

Error^9

47.97273^ 5.

Corrected^ Total^12

Parameter^ StandardVariable Estimate^ Error^ Type^

II^ SS^ F^ Value^ Pr^ >^ F Intercept^ 71.64831^ 14.

136.81003^ 25.67^ 0.

x1^ 1.45194^ 0.

820.90740^ 154.01^ <.

x2^ 0.41611^ 0.

26.78938^ 5.03^ 0.

x4^ -0.23654^ 0.

9.93175^ 1.86^ 0.2054Bounds on condition number: 18.94,^ 116.

Stepwise^ Selection: Step 4Variable x4 Removed:^ R-Square^ =^ 0.9787^ and C(p)

=^ 2.6782Analysis of VarianceSum of Mean Source^ DF^

Squares^ Square^ F Value

Pr^ >^ F Model^2

2657.85859^ 1328.^

229.50^ <.

Error^10

57.90448^ 5.

Corrected^ Total^12

2715.76308Parameter StandardVariable Estimate Error^ Type^ II^ SS^ F^ Value

Pr^ >^ F Intercept^ 52.57735^ 2.

3062.60416^ 528.91^ <.

x1^ 1.46831^ 0.

848.43186^ 146.52^ <.

x2^ 0.66225^ 0.

1207.78227^ 208.58^ <.0001Bounds on condition number: 1.0551, 4. ----------------------------------------------------------------------------------------------------All^ variables^

left in the^ model are^ significant

at^ the 0.1500^ level. No^ other^ variable^ met^ the 0.

significance^ level^ for^ entry into

the model.

Regression^ : Variable Subset^ Selection

Techniques^

The^ REG^ ProcedureModel:^ MODEL4Dependent^ Variable:^ y R-Square^ Selection MethodNumber of Observations Read^13 Number of Observations Used^13 Number inModel R-Square C(p)^ SSE^ Variables

in Model 1 0.6745^ 138.^

883.86692^ x 1 0.6663^ 142.^

906.33634^ x 1 0.5339^ 202.^

1265.68675^ x 1 0.2859^ 315.^

1939.40047^ x ----------------------------------------------------------------------^2 0.9787^ 2.

57.90448^ x1^ x2 2 0.9725 5.4959 74.76211^ x1^ x4 2 0.9353 22.3731 175.73800^ x3^ x4 2 0.8470 62.4377 415.44273^ x2^ x3 2 0.6801 138.2259 868.88013^ x2^ x4 2 0.5482 198.0947 1227.07206^ x1^ x ----------------------------------------------------------------------^3 0.9823^ 3.

47.97273^ x1^ x2^ x4 3 0.9823 3.0413 48.11061^ x1^ x2^ x3 3 0.9813 3.4968 50.83612^ x1^ x3^ x4 3 0.9728 7.3375 73.81455^ x2^ x3^ x ----------------------------------------------------------------------^4 0.9824^ 5.

47.86364^ x1^ x2^ x3^ x

Regression^ : Variable Subset^ Selection

Techniques^ Number^ inModel^ R-Square^ C(p)

AIC^ MSE^

  • The REG ProcedureModel: MODEL5Dependent Variable: y R-Square Selection MethodNumber of Observations Read^13 Number of Observations Used
    • 1 0.6745 138. SSE Variables in Model
    • 58.8516 80.35154 883.
    • x
      • 1 0.6663 142.
      • 59.1780 82.39421 906.
      • x
        • ---------------------------------------------------------------------------------------------------^2 0.9787 2.
          • 25.4200 5.
          • 57.90448 x1 x
            • 2 0.9725 5.
            • 28.7417 7.47621 74.
            • x1 x
              • ---------------------------------------------------------------------------------------------------^3 0.9823 3.
                • 24.9739 5.
                • 47.97273 x1 x2 x
                  • 3 0.9823 3.
                  • 25.0112 5.34562 48.
                  • x1 x2 x