Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Regression Analysis: Regularized Regression and Case Studies, Exercises of Statistics

Georgia State University (GSU)Statistics

Churn Rate Regression Analysis

Typology: Exercises

2022/2023

Uploaded on 12/11/2023

karan-nahar 🇺🇸

4 documents

1 / 35

This page cannot be seen from the preview

Don't miss anything!

Regression Analysis

Module 5: Other Regression Models and Case Studies

Outline of Lessons

1. Regularized Regression

2. Case Study: ER Volume

3. Case Study: Customer Churn

Discover Exercises of Statistics Georgia State University (GSU)

Partial preview of the text

Download Regression Analysis: Regularized Regression and Case Studies and more Exercises Statistics in PDF only on Docsity!

Regression Analysis

Module 5: Other Regression Models and Case Studies

Outline of Lessons

Regularized Regression
Case Study: ER Volume
Case Study: Customer Churn

Nicoleta Serban

Gamze Tokol-Goldsman

Dave Goldsman

Regression Analysis

Module 5: Other Regression Models and

Case Studies

H. Milton Stewart School of Industrial and Systems Engineering

5.1 Regularized Regression

Bias-Variance Tradeoff

Biased Regression: Penalties

Not all biased models are better.

We need a way to find “good” biased models!

Penalize large values of b s jointly
- Should lead to “multivariate” shrinkage of the vector b
Goal is really to penalize “complex” models
- Heuristically, “large” is interpreted as “complex model”
  - If truth really is complex, this may not work!
    - It will then be hard to build a good model anyways

Regularized Regression (cont’d)

The penalized sum of squared errors:
We consider three choices for the penalty:

penalty

𝑗

≠ 0 } ⇒ Minimizing Q means searching through all submodels

penalty (LASSO Regression)

= σ 𝑗= 1

𝑝

𝑗

| ⇒ Minimizing Q forces many 𝛽 𝑗

s to be zeros

penalty (Ridge Regression)

= σ 𝑗= 1

𝑝

𝑗

⇒ Minimizing Q accounts for multicollinearity

𝑝

𝑖= 1

𝑛

𝑖

𝑝

𝑖𝑝

λ𝑃𝑒𝑛𝑎𝑙𝑡𝑦(𝛽 1

𝑝

Comparing Penalties

penalty

Provides best model given a selection criterion
Requires fitting all submodels

penalty

Measures sparsity

penalty

Easy to implement
Does not do variable selection
Example: Consider vectors 𝒖 = 1 , 0 , ⋯ , 0 and 𝒗 = (

𝑝

), both of length 𝑝.

Vector 𝒖 is sparse, because it contains mostly zeros.

Using the 𝑳 1

norm, we have | 𝒖 | 1

= σ 𝑖= 1

𝑝

𝑖

| = 1 and | 𝒗 | 1

= σ 𝑖= 1

𝑝

𝑖

Using the 𝑳 2

norm, we have | 𝒖 | 2

= σ 𝑖= 1

𝑝

𝑖

= 1 and | 𝒗 | 2

= σ 𝑖= 1

𝑝

𝑖

= 1.

The 𝑳 1

penalty rewards the sparsity of 𝒖; the 𝑳 2

penalty makes no distinction.

Ridge Regression

Minimizes SSE plus the penalty the penalty term
Provides closed-form estimate of regression coefficients (

෡ 𝜷)

෡ 𝜷 = (𝑿

𝑇 𝑿 + λ𝐈)

− 1 𝑿

𝑇 𝒀

𝐈 is the identity matrix

λ = 0 gives least squares estimate (low bias, high variance)
λ → ∞ gives

෡ 𝜷 → 0 (high bias, low variance)

Commonly used under multicollinearity
Not used for model selection
- Shrinks but does not “force” any

መ 𝛽 𝑗

to equal 0

𝑆𝑆𝐸 λ

(𝜷) = ෍

𝑖= 1

𝑛

𝑦 𝑖

− (𝛽 0

𝛽 1

𝑥 𝑖

⋯ + 𝛽 𝑝

𝑥 𝑖𝑝

)

λ ෍

𝑗= 1

𝑝

𝛽 𝑗

LASSO Regression

L east A bsolute S hrinkage and S election O perator
Normal Linear Regression minimizes
Generalized Linear Model minimizes

ℓ ( b ) is the log-likelihood function

Estimated regression coefficients
- Must use numerical algorithms
- No closed-form expression
Used for model selection
- Does “force” some

መ 𝛽 𝑗

to equal 0

𝑆𝑆𝐸 λ

(𝜷) = ෍

𝑖= 1

𝑛

𝑦 𝑖

− (𝛽 0

𝛽 1

𝑥 𝑖

⋯ + 𝛽 𝑝

𝑥 𝑖𝑝

)

λ ෍

𝑗= 1

𝑝

|𝛽 𝑗

𝑆𝑆𝐸 λ

(𝜷) = −ℓ(𝛽 0

, ⋯ , 𝛽 𝑝

) + λ ෍

𝑗= 1

𝑝

|𝛽 𝑗

Choosing l: Cross-Validation

Split the data 𝑥 11

1𝑝

𝑛 , ⋯ ,

𝑛𝑝

𝑛

into two sets.

Training set
- Use to fit the penalized model
  - Given l, estimate

መ 𝛽 0

መ 𝛽 1

, ⋯ ,

መ 𝛽 𝑝

Testing/Validation set
- Use to evaluate performance of model obtained with training set
  - Estimate mean squared error (MSE) for normal regression
  - Estimate classification error rate for logistic regression
  - Estimate sum of squared deviances for Poisson regression
  - Generally, estimate a scoring rule depending on the regression model

The process can be repeated for multiple ls.

Cross Validation: How to Split Data?

K-fold cross-validation (KCV)

Divide data into K chunks of approximately equal size
For a range of λ penalty values, e.g., λ 1

, ⋯ , λ B

, and for k = 1 to K

The training set consists of data without the k-th fold of data, and the testing set consists of the

k-th fold

Given λ, fit a model on the training data and predict responses
Given λ, compute mean squared error or classification error rate for the k-th fold testing data
Given λ, after K folds have been processed, compute overall error (e.g., MSE or classification

error) for that λ for all folds

➔ Select λ penalty providing minimum overall error

LASSO: Limitations

LASSO selects only up to n variables
- n is the number of observations
- If the number of potential predictors is greater than the number of

observations, LASSO will select at most n of them

Since, normally, n > p, not a significant limitation
If there are high correlations among predictors
LASSO is dominated by ridge regression
If there is a group of variables with high correlation
LASSO tends to select only one variable from the group
LASSO doesn't care which one

Elastic Net

Elastic Net minimizes

penalty generates a sparse model

penalty

Removes the limitation on the number of selected variables
Encourages group effect
Stabilizes the 𝑳 1

regularization path

𝑖= 1

𝑛

𝑖

𝑝

𝑖𝑝

λ 1

𝑗= 1

𝑝

𝑗

| + λ 2

𝑗= 1

𝑝

𝑗

Reference: Hui Zou and Trevor Hastie. "Regularization and variable selection via the elastic net."

Journal of the Royal Statistical Society: Series B 67.2 (2005): 301-320.

library(MASS)

## Scale the predicting variables and the response variable

ltakers = log(takers)

predictors = cbind(ltakers, rank, income, years, public, expend)

predictors = scale(predictors)

sat.scaled = scale(sat)

## Apply ridge regression for a range of penalty constants

lambda = seq(0, 10, by=0.25)

out = lm.ridge(sat.scaled~predictors, lambda=lambda)

round(out$GCV, 5)

which(out$GCV == min(out$GCV))

round(out$coef[,10], 4)

predictorsltakers predictorsrank predictorsincome predictorsyears predictorspublic

0.4771 0.4195 0.0223 0.1796 - 0.

predictorsexpend

Ridge Regression

The ridge regression outputs estimates for

each lambda in the considered range (not

shown)

The lambda is selected to minimize the

(generalized) CV score

plot(lambda, out$coef[1,], type = "l", col=1, lwd=3,

xlab = "Lambda", ylab = "Coefficients",

main = "Plot of Regression Coefficients vs. Lambda

Penalty Ridge Regression",

ylim = c(min(out$coef), max(out$coef)))

for(i in 2:6)

points(lambda, out$coef[i,], type = "l", col=i, lwd=3)

abline(h = 0, lty = 2, lwd = 3)

abline(v = 2.25, lty = 2, lwd=3)

Regression Analysis: Regularized Regression and Case Studies, Exercises of Statistics

Related documents

Partial preview of the text

Download Regression Analysis: Regularized Regression and Case Studies and more Exercises Statistics in PDF only on Docsity!

Regression Analysis

Module 5: Other Regression Models and Case Studies

Regression Analysis

Module 5: Other Regression Models and

Case Studies

Bias-Variance Tradeoff

Biased Regression: Penalties

Regularized Regression (cont’d)

Comparing Penalties

Ridge Regression

LASSO Regression

Choosing l: Cross-Validation

Cross Validation: How to Split Data?

observations, LASSO will select at most n of them

Elastic Net

Ridge Regression

Ridge Regression