EECS 553: Machine Learning - Lecture 15: Modern Regularization and Optimization Methods, Slides of Biostatistics

This lecture explores modern regularization and optimization methods in machine learning, focusing on deep learning techniques. It delves into the challenges of training deep convolutional neural networks (cnns) and introduces resnet, a solution to overcome optimization difficulties. The lecture then covers various regularization techniques, including max-norm, dropout, early stopping, and data augmentation, explaining their roles in preventing overfitting and improving generalization. It concludes with a discussion of optimization algorithms beyond stochastic gradient descent (sgd), including momentum, adaptive methods like adagrad, rmsprop, and adam, highlighting their advantages and applications in accelerating learning.

Typology: Slides

2023/2024

Uploaded on 02/25/2025

liu-zejia
liu-zejia 🇺🇸

1 document

1 / 68

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
EECS 553: Machine Learning
Lecture 15: Modern Regularization and
Optimization Methods
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44

Partial preview of the text

Download EECS 553: Machine Learning - Lecture 15: Modern Regularization and Optimization Methods and more Slides Biostatistics in PDF only on Docsity!

EECS 553: Machine Learning

Lecture 15: Modern Regularization and Optimization Methods

First: Wrap up ConvNets

What happens when we continue stacking deeper layers on a “plain” convolutional neural network?

Case Study: ResNet

Q: What’s weird about these training and test curves? 56 - layer model performs worse on both training and test error

  • > The deeper model performs worse, but it’s not caused by overfitting!

Training ResNets in practice

  • SGD + Momentum (0.9)
  • Learning rate: 0.1, divided by 10 when validation error plateaus
  • Mini-batch size 256
  • Weight decay of 1e- 5
  • No dropout used
  • Batch Normalization after every CONV layer
  • Xavier/2 initialization from He et al.

Time to Revisit:

Optimization and Regularization

Regularization

Regularization

  • Regularized empirical risk minimization ! !

!"# $ ' ( !

!

  • Ridge:. " = " % %
  • Lasso:. " = "

    Penalty function

Implication: Model weights are small Implication: Model weights are small and mostly equal to zero Loss function !(#)

Regularization

Regularized empirical risk minimization ! !

  • Gradient updates: " &'#

&

!

& = " &

&

&