Deep Residual Learning for Image Recognition: A Comprehensive Guide, Exercises of Architecture

Our current optimization solvers are not able to approximate the identity mappings of a stack of added non-linear layers.

Typology: Exercises

2022/2023

Uploaded on 02/28/2023

rakshan
rakshan 🇺🇸

4.6

(18)

239 documents

1 / 27

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Deep Residual Learning for Image Recognition
Authors:
Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun
Presenter: Masoud Hoveidar
Facilitators: Amber Ma and Ramya Balasubramaniam
12th August 2019
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b

Partial preview of the text

Download Deep Residual Learning for Image Recognition: A Comprehensive Guide and more Exercises Architecture in PDF only on Docsity!

Deep Residual Learning for Image Recognition

Authors: Kaiming He, Xiangyu Zhang, Shaoqing Ren and Jian Sun

Presenter: Masoud Hoveidar

Facilitators: Amber Ma and Ramya Balasubramaniam

12 th^ August 2019

How DEEP should we make our Neural Networks?

● It Depends on:

○ The complexity of the task at hand ○ Available computational capacity in the time of training ○ Available computational capacity in the time of inference (e.g. on edge devices)

● If the task needs a lot of parameters:

○ Can we train very deep networks efficiently using current optimization solvers? ○ Is training a better model as simple as adding more and more layers?

How DEEP should we make our Neural Networks?

● It Depends on:

○ The complexity of the task at hand ○ Available computational capacity in the time of training ○ Available computational capacity in the time of inference (e.g. on edge devices)

● If the task needs a lot of parameters:

○ Can we train very deep networks efficiently using current optimization solvers? ○ Is training a better model as simple as adding more and more layers?

NO

Why is it not OK to just add more layers?

● Cause it introduces some problems during training such as:

○ Vanishing/Exploding gradients ■ Can be addressed by normalized initialization and intermediate normalization ○ Degradation problem ■ What should we do about it?

Degradation problem … (continued)

conv conv conv

fc softmax

Acc. = X%

Degradation problem … (continued)

conv conv conv

fc softmax

Acc. = X%

conv conv conv identity identity

fc softmax

Acc. = X% identity identity

Degradation problem … (continued)

● Our current optimization solvers are not able to approximate the identity

mappings of a stack of added non-linear layers

● Otherwise, the accuracy of a deeper network should have been at least the

same as a shallower one

● NOTE: This should not be misunderstood with “overfitting”

Degradation problem … (continued)

Residual block

● Residual architecture adds explicit identity connections throughout the

network to help learning the required identity mappings

X

weight layer weight layer

ReLU

X (identity)

ReLU

Y

Residual block (continued)

● Using this approach, network will decide how deep it needs to be

● These identity connections introduce no new parameter to the network

architecture, hence it will not add any computational burden

● This method allows us to design deeper networks in order to deal with much

complicated problems and tasks

Resnet architecture

Y = F(x,{Wi}) + Wsx

Linear projections For dimension matching

5 min Break

Resnet architectures for ImageNet dataset

“18 layers vs 34 layers” on ImageNet dataset