R Tips for Linear Modeling: Handling Collinearity and Backing up Files in Unix - Prof. Cha, Study notes of Computer Science

R tips for linear modeling, focusing on handling collinearity and backing up files in unix. It covers standardizing inputs, calculating means and standard deviations, and creating a function for standardizing data. The document also discusses what makells should return and collecting and combining multiple results.

Typology: Study notes

Pre 2010

Uploaded on 11/08/2009

koofers-user-aom
koofers-user-aom ๐Ÿ‡บ๐Ÿ‡ธ

9 documents

1 / 33

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS545: Linear
Modeling
Chuck Anderson
R Tips for Linear
Modeling
Backing up Files in
Unix
Collinearity
CS545: Linear Modeling
Chuck Anderson
Department of Computer Science
Colorado State University
Fall, 2009
1 / 33
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21

Partial preview of the text

Download R Tips for Linear Modeling: Handling Collinearity and Backing up Files in Unix - Prof. Cha and more Study notes Computer Science in PDF only on Docsity!

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

CS545: Linear Modeling

Chuck Anderson

Department of Computer Science

Colorado State University

Fall, 2009

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

Outline

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

Standardizing Inputs

Standardize attribute values (each has mean zero, unit

variance):

Calculate mean of each attribute for training data.

means <โˆ’ colMeans(Xtrain)

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

Standardizing Inputs

Standardize attribute values (each has mean zero, unit

variance):

Calculate mean of each attribute for training data.

means <โˆ’ colMeans(Xtrain)

Calculate standard deviation of each attribute for

training data.

stdevs <โˆ’ sd(Xtrain)

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

Standardizing Inputs

Standardize attribute values (each has mean zero, unit

variance):

Calculate mean of each attribute for training data.

means <โˆ’ colMeans(Xtrain)

Calculate standard deviation of each attribute for

training data.

stdevs <โˆ’ sd(Xtrain)

Subtract means and divide by stdevs, column by column

Xstrain <โˆ’ (Xtrain โˆ’ matrix(means,nrow(Xtrain),ncol(Xtrain),byrow=TRUE)) / matrix(stdevs ,nrow(Xtrain),ncol(Xtrain ), byrow=TRUE)

To standardize testing data, use means and stdevs

calculated from training data.

Xstest <โˆ’ (Xtest โˆ’ matrix(means,nrow(Xtest),ncol(Xtest),byrow=TRUE)) / matrix(stdevs ,nrow(Xtest),ncol(Xtest ), byrow=TRUE)

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

Must keep track of means and stdevs from training

data. Can do as variables returned from standardize

function:

standardize <โˆ’ function(X,means=apply(X,2,mean),stdevs=apply(X,2,sd), returnParms=FALSE) {

X is nSamples by nInputComponents

stdevs [ stdevs==0] <โˆ’ 1 N <โˆ’ nrow(X) p <โˆ’ ncol(X) X <โˆ’ (X โˆ’ matrix(rep(means,N),N,p,byrow=TRUE))/ matrix(rep( stdevs ,N),N,p,byrow=TRUE) if (returnParms) list (data=X,means=means,stdevs=stdevs) else X }

used like

tp <โˆ’ standardize(Xtrain,returnParms=TRUE) Xstrain <โˆ’ tp$data Xstest <โˆ’ standardize(Xtest, tp$means, tp$stdevs)

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

What should makeLLS return?

Certainly want the weights returned. After all, that is

the model. What else?

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

What should makeLLS return?

Certainly want the weights returned. After all, that is

the model. What else?

The means and stdevs should also be associated with

this model. Different models will have different weights

and different standardization parameters.

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

What should makeLLS return?

Certainly want the weights returned. After all, that is

the model. What else?

The means and stdevs should also be associated with

this model. Different models will have different weights

and different standardization parameters.

How would makeLLS return all of this?

return( list ( weights=w, standardize=standardize ) )

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

What should makeLLS return?

Certainly want the weights returned. After all, that is

the model. What else?

The means and stdevs should also be associated with

this model. Different models will have different weights

and different standardization parameters.

How would makeLLS return all of this?

return( list ( weights=w, standardize=standardize ) )

Use like

model <โˆ’ makeLLS(Xtrain,Ttrain,lambda) predictions <โˆ’ useLLS(model,Xtest)

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

What should makeLLS return?

Certainly want the weights returned. After all, that is

the model. What else?

The means and stdevs should also be associated with

this model. Different models will have different weights

and different standardization parameters.

How would makeLLS return all of this?

return( list ( weights=w, standardize=standardize ) )

Use like

model <โˆ’ makeLLS(Xtrain,Ttrain,lambda) predictions <โˆ’ useLLS(model,Xtest)

Inside useLLS how would you use model?

Say useLLS has arguments named model and X:

Xs <โˆ’ model$standardize(X) predictions <โˆ’ Xs %*% model$weights

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

Collecting and Combiningg Multiple Results

We often want to repeat a calculation a number of

times using different parameter values, like values of ฮป

and of training set fraction. So, you might use a for

loop like

for ( trainf in c (0.2, 0.4, 0.6, 0.8, 0.9)) { for ( repi in 1:200) { for (lambda in seq (0,10, by=0.5)) {

do calculation here using trainf and lambda to obtain

trainRMSE and testRMSE

} } }

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

Collecting and Combiningg Multiple Results

We often want to repeat a calculation a number of

times using different parameter values, like values of ฮป

and of training set fraction. So, you might use a for

loop like

for ( trainf in c (0.2, 0.4, 0.6, 0.8, 0.9)) { for ( repi in 1:200) { for (lambda in seq (0,10, by=0.5)) {

do calculation here using trainf and lambda to obtain

trainRMSE and testRMSE

} } }

Can try to do sums of RMSEโ€™s so you can calculate

average later. But, letโ€™s use that cheap memory, and

just collect each result in a new row in a matrix.

do calculation here using trainf and lambda to obtain

trainRMSE and testRMSE

results <โˆ’ rbind(results, c( trainf ,lambda, trainRMSE, testRMSE))

Donโ€™t forget to initialize results

results <โˆ’ c()

before you start the for loops.

Modeling

Chuck Anderson

R Tips for Linear Modeling

Backing up Files in Unix

Collinearity

Now, the matrix has many rows (200) for each pair of

(trainf, lambda) values. How can we calculate the

means of those 200 values? Check out ?unique.

results [,1] [,2] [,3] [,4] [1,] 0.2 0.1 3.2 3. [2,] 0.2 0.5 5.3 3. [3,] 0.2 0.1 5.5 3. unique(results [,1:2]) [,1] [,2] [1,] 0.2 0. [2,] 0.2 0.