Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Regression Analysis Notation and Methods, Study notes of Mathematical Statistics

Alliance University Mathematical Statistics

An overview of the notation used in regression analysis, including definitions of key terms such as dependent variable, independent variables, sample mean, leverage, and regression weights. It also discusses methods for variable entry and removal, including f-to-enter, f-to-remove, and selection criteria such as akaike information criterion (aic), amemiya's prediction criterion (pc), and mallow's cp. The document also covers collinearity and the calculation of variance inflation factors (vif), eigenvalues, and statistics for variables in the equation and not in the equation.

Typology: Study notes

2011/2012

Uploaded on 10/31/2012

sangawar 🇮🇳

4.5

(4)

118 documents

1 / 21

This page cannot be seen from the preview

Don't miss anything!

REGRESSION

This procedure performs multiple linear regression with five methods for entry and

removal of variables. It also provides extensive analysis of residual and influential

cases. Caseweight (CASEWEIGHT) and regression weight (REGWGT) can be

specified in the model fitting.

Notation

The following notation is used throughout this chapter unless otherwise stated:

yi Dependent variable for case

with variance

2gi

ci Caseweight for case i;ci=1 if CASEWEIGHT is not specified

gi Regression weight for case i; gi=1 if REGWGT is not specified

l Number of distinct cases

wi cg

W wi

∑

p Number of independent variables

C Sum of caseweights: ci

∑

xki The kth independent variable for case i

Xk Sample mean for the kth independent variable: XwxW

kiki













∑

Sample mean for the dependent variable: YwyW













∑

hi Leverage for case i

Discover Study notes of Mathematical Statistics Alliance University

Partial preview of the text

Download Regression Analysis Notation and Methods and more Study notes Mathematical Statistics in PDF only on Docsity!

This procedure performs multiple linear regression with five methods for entry and removal of variables. It also provides extensive analysis of residual and influential cases. Caseweight (CASEWEIGHT) and regression weight (REGWGT) can be specified in the model fitting.

Notation

The following notation is used throughout this chapter unless otherwise stated:

yi (^) Dependent variable for case i with variance σ 2 g (^) i

ci Caseweight for case i; ci = 1 if CASEWEIGHT is not specified

g (^) i Regression weight for case i; g (^) i = 1 if REGWGT is not specified l Number of distinct cases wi c gi i

W (^) wi i

∑ 1 p Number of independent variables

C (^) Sum of caseweights: ci i

∑ 1 x (^) ki The^ kth independent variable for case^ i

X (^) k Sample mean for the kth independent variable: (^) X (^) k w xi ki W i

l

& &

(

) ) =

∑ 1

Y (^) Sample mean for the dependent variable: Y w yi i W i

l

& &

(

) ) =

∑ 1 hi Leverage for case^ i

hi^ g W

i (^) +hi

S (^) kj Sample covariance for X (^) k and X (^) j

S (^) yy Sample variance for^ Y

S (^) ky Sample covariance for X (^) k and Y

p∗^ Number of coefficients in the model. p ∗^ = pif the intercept is not included; otherwise p ∗^ = p+ 1 R (^) The sample correlation matrix for X 1 ,! ,X (^) pand Y

Descriptive Statistics

R =

2 2 2 2 2

5 5 5 5 5

r r r r r r

r r r

p y p y

y yp yy

11 1 1 21 2 2

where

S

S S

kj kk jj

and

r r

S

S S

yk ky

ky kk yy

The sample mean X (^) i and covariance Sij are computed by a provisional means algorithm. Define

Wk wi i

k = = =

∑ 1

cumulative weight up to case k

Yi = β 0 + β 1 X (^1) i + β 2 X (^2) i + "+ βp X (^) pi +ei

sweep operations are used to compute the least squares estimates b of i and the

associated regression statistics. The sweeping starts with the correlation matrix R.

Let

R be the new matrix produced by sweeping on the kth row and column of R.

The elements of

R are

r

i k

r

j k

kk kk

ik kk

and

~r r r^ r r , , r ij i^ k^ j^ k

ij kk ik kj kk

If the above sweep operations are repeatedly applied to each row of R 11 in

R

R R

% ' &

( 0 )

11 12 21 22

where R 11 contains independent variables in the equation at the current step, the

result is

R

R R R

R R R R R R

% ' &

( 0 )

− − − −

1 11

1 12 21 11

1 22 21 11

1 12

The last row of

R 21 R 11 −^1

contains the standardized coefficients (also called BETA), and

R 22 − R 21 R 11 −^1 R^12

can be used to obtain the partial correlations for the variables not in the equation, controlling for the variables already in the equation. Note that this routine is its own inverse; that is, exactly the same operations are performed to remove a variable as to enter a variable.

Variable Selection Criteria

Let rij be the element in the current swept matrix associated with X (^) i and X (^) j. Variables are entered or removed one at a time. X (^) k is eligible for entry if it is an independent variable not currently in the model with

rkk ≥ t (tolerance with a default of 0.0001)

and also, for each variable X (^) j that is currently in the model,

r r r jj jk kj t kk

% '&^

( 0 )^

The above condition is imposed so that entry of the variable does not reduce the tolerance of variables already in the model to unacceptable levels. The F-to-enter value for X (^) k is computed as

F to enter

C p V k r V

yy k

R W

with 1 and C − p∗^ − 1 degrees of freedom, where p∗^ is the number of coefficients currently in the model and

V

r r k r

yk ky kk

Enter (Forced Entry)

Choose X (^) k such that rkk is maximum and enter X (^) k. Repeat for all variables to be entered.

Remove (Forced Removal)

Choose X (^) k such that rkk is minimum and remove X (^) k. Repeat for all variables to be removed.

Statistics

Summary

For the summary statistics, assume p independent variables are currently entered in the equation, of which a block of q variables have been entered or removed in the current step.

Multiple R

R = 1 −ryy

R Square

R 2 = 1 −ryy

Adjusted R Square

R R

R p

C p

adj

2 2

R W

R Square Change (when a block of q independent variables was added or removed)

9 R 2 = R (^) current^2 −Rprevious^2

F Change and Significance of F Change

F

R C p

q R

R C p q

q R

current

u u u

∗

2 2

R W

R W R W

R W

for the addition of independent variables

for the removal of independent variables

the degrees of freedom for the addition are q and C − p∗^ , while the degrees of freedom for the removal are q and C − p ∗^ −q.

Residual Sum of Squares

SS (^) e = r (^) yy IC − (^1) TSyy

with degrees of freedom C − p∗^.

Sum of Squares Due to Regression

SS (^) R = R (^2) IC − (^1) TSyy

with degrees of freedom p.

Amemiya’s Prediction Criterion (PC)

PC

R C p

C p

∗

R^1 2 WR W

Mallow’s Cp (CP)

CP

SS

= e+ p −C

σ 2

where σ 2 is the mean square error from fitting the model that includes all the variables in the variable list.

Schwarz Bayesian Criterion (SBC)

SBC C

SS

C

= % e p C '&^

( 0 )^

ln + ∗lnI T

Collinearity

Variance Inflation Factors

VIF

r i ii

Tolerance

Tolerance (^) i =rii

Eigenvalues, λκ

The eigenvalues of scaled and uncentered cross-product matrix for the independent variables in the equation are computed by the QL method (Wilkinson and Reinsch, 1971).

Condition Indices

λ k λ

j k

max

Variance-Decomposition Proportions

Let

v i = (^) Q vi 1 , !,vipV

be the eigenvector associated with eigenvalue λi. Also, let

Aij = vij^2 λ (^) i and A (^) j Aij i

p

∑ 1

The variance-decomposition proportion for the jth regression coefficient associated with the ith component is defined as

π (^) ij = Aij Aj

Statistics for Variables in the Equation

Regression Coefficient b (^) k

b

r S

S

yk yy

= for k = 1, !,p

F-test for Beta (^) k

F

Beta (^) k Beta (^) k

(

0 σ )

with 1 and C − p∗^ degrees of freedom.

Part Correlation of Xk with Y

Part Corr X

r r k

yk kk

− (^) I T=

Partial Correlation of Xk with Y

Partial Corr X

r r r r r k

yk kk yy yk ky

I T

Statistics for Variables Not in the Equation

Standardized regression coefficient Beta k^ ∗^ if Xk enters the equation at the next step

Beta

r r k

yk kk

The F-test for Beta (^) k^ ∗

F

C p r

r r r

kk yy yk

R W

with 1 and C − p∗^ degrees of freedom

Partial Correlation of Xk with Y

Partial X

r r r k

yk yy kk

I T =

Tolerance of Xk

Tolerance (^) k =rkk

Minimum tolerance among variables already in the equation if Xk enters at the next step is

min , 1

& &

(

) j p (^) rjj rkj rjk rkk kk)

r Q V

Residuals and Associated Statistics

There are 19 temporary variables that can be added to the active system file. These variables can be requested with the RESIDUAL subcommand.

Centered Leverage Values

For all cases, compute

g C

X X X X r S S

g C

X X r S S

i ji^ j^ ki^ k^ jk k jj^ kk

i ji^ ki^ jk k jj^ kk

u u u u

= =

∑∑

1 1

H S

Q VQ V

H S

if intercept is included

otherwise

Standardized Predicted Values

ZPRED

Y Y

sd i

i

(^7) −

u u

if no regression weight is specified

SYSMIS otherwise

where sd is computed as

c Y Y C

i i

l

∑

R W^2

Studentized Residuals

SRES

e s

h g

e s

h g

i i i

u uu

u u u

R W

for selected cases with

otherwise

Deleted Residuals

DRESID

e h c e i

i i i i

(^7) − > 8

9 u

R W for selected cases with otherwise

Studentized Deleted Residuals

SDRESID

DRESID

s h g

i i

i i i

u uu

u u u

I T

R W

for selected cases with

otherwise

where s I Ti is computed as

s C p

C p s

h i DRESID i

I T i

R W

− −

∗

∗ 1 1 1

2 2 ~

Adjusted Predicted Values

ADJPRED (^) i = Yi −DRESIDi

DfBeta

DFBETA b b i

g e h i

i i i

− I T

I X WX T^ X 1

where

X it^ i pi i pi

X X

7 8

9 u^

Q V Q V

if intercept is included ottherwise

and W = diag (^) Iw 1 , !,wlT.

For unselected cases with ci > 0

MAHAL

C h i (^) C h

i i

7 8 9

if intercept is included I 1 T otherwise

Cook’s Distance (Cook, 1977)

For selected cases with ci > 0

COOK

DRESID h g s p

i i i

(^7) + 8

9 u

2 2

R W I^ T

R W R W

if intercept is included

otherwise

For unselected cases with ci > 0

COOK

DRESID h W

s p

DRESID h s p

i i

% ' & ( 0 )

% ' &

( 0 ) +

u u

2 2

I T

R W R W

if intercept is included

otherwise

where hi′ is the leverage for unselected case i, and ~s 2 is computed as

~s C^ p^

SS e h W

C p

SS e h

e i i

% ' & ( 0 )

1 3

4 6

u u

if intercept is included

I T otherwise

Standard Errors of the Mean Predicted Values

For all the cases with positive caseweight,

SEPRED

s h g s h g

i i i i

7 8

9 u

if intercept is included otherwise

95% Confidence Interval for Mean Predicted Response

LMCIN Y t SEPRED

UMCIN Y t SEPRED

i i (^) C p i

−

∗

. , . ,

0 025

95% Confidence Interval for a Single Observation

LICIN

Y t s h g

UICIN

Y t s h g

i (^) C p i i

i C p i i

i (^) C p i i

i C p i i

7 8

−

∗

. , . , . , . ,

0 025

R W

I T

R W

I T

if intercept is included

otherwise

if intercept is included

otherwise

Durbin-Watson Statistic

DW

e e

c e

i i i

= l

∑

2 2

I T

where ~e (^) i = ei gi.

Regression Analysis Notation and Methods, Study notes of Mathematical Statistics

Related documents

Partial preview of the text

Download Regression Analysis Notation and Methods and more Study notes Mathematical Statistics in PDF only on Docsity!

Notation

l

l

Descriptive Statistics

R =

S

S S

S

S S

r

r

r

r

r

i k

r

r

r

j k

R

R R

R R

R

R R R

R R R R R R

R 21 R 11 −^1

R 22 − R 21 R 11 −^1 R^12

Variable Selection Criteria

V

Enter (Forced Entry)

Remove (Forced Removal)

Statistics

Summary

R R

F

PC

CP

SS

SBC C

SS

C

Collinearity

VIF

p

Statistics for Variables in the Equation

b

r S

S

= for k = 1, !,p

F

Statistics for Variables Not in the Equation

F

Residuals and Associated Statistics

ZPRED

Y Y

i

l

SRES

DRESID

SDRESID

DRESID

I T

where s I Ti is computed as

I T i

R W

X X

X X

MAHAL

COOK

COOK

SEPRED

LICIN

UICIN

DW