

















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Image classification. • Object detection. • Semantic segmentation. • and more… Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for ...
Typology: Lecture notes
1 / 89
This page cannot be seen from the preview
Don't miss anything!


















































































1 x^1 conv ,^64 3 x^3 conv ,^64 1 x^1 conv ,^256 1 x^1 conv ,^64 3 x^3 conv ,^64 1 x^1 conv ,^256 1 x^1 conv ,^64 3 x^3 conv ,^64 1 x^1 conv ,^256 1 x^1 conv ,^128 ,^ /^2 3 x^3 conv ,^128 1 x^1 conv ,^512 1 x^1 conv ,^128 3 x^3 conv ,^128 1 x^1 conv ,^512 1 x^1 conv ,^128 3 x^3 conv ,^128 1 x^1 conv ,^512 1 x^1 conv ,^128 3 x^3 conv ,^128 1 x^1 conv ,^512 1 x^1 conv ,^128 3 x^3 conv ,^128 1 x^1 conv ,^512 1 x^1 conv ,^128 3 x^3 conv ,^128 1 x^1 conv ,^512 1 x^1 conv ,^128 3 x^3 conv ,^128 1 x^1 conv ,^512 1 x^1 conv ,^128 3 x^3 conv ,^128 1 x^1 conv ,^512 1 x^1 conv ,^256 , /^2 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv , 1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv , 1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv , 1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv , 1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv , 1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv , 1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv , 1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv , 1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^256 3 x^3 conv ,^256 1 x^1 conv ,^1024 1 x^1 conv ,^512 ,^ /^2 3 x^3 conv ,^512 1 x^1 conv ,^2048 1 x^1 conv ,^512 3 x^3 conv ,^512 1 x^1 conv ,^2048 1 x^1 conv ,^512 3 x^3 conv ,^512 1 x^1 conv ,^2048 ave pool ,^ fc^1000 7 x^7 conv ,^64 ,^ /^2 , pool^ / 2
Deep Residual Networks (ResNets)
11 x 11 conv, 96 , / 4 , pool/ 2 5 x 5 conv, 256 , pool/ 2 3 x 3 conv, 384 3 x 3 conv, 384 3 x 3 conv, 256 , pool/ 2 fc, 4096 fc, 4096 fc, 1000 AlexNet, 8 layers (ILSVRC 2012) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016.
11 x 11 conv, 96 , / 4 , pool/ 2 5 x 5 conv, 256 , pool/ 2 3 x 3 conv, 384 3 x 3 conv, 384 3 x 3 conv, 256 , pool/ 2 fc, 4096 fc, 4096 fc, 1000 AlexNet, 8 layers (ILSVRC 2012) 3 x 3 conv, 64 3 x 3 conv, 64 , pool/ 2 3 x 3 conv, 128 3 x 3 conv, 128 , pool/ 2 3 x 3 conv, 256 3 x 3 conv, 256 3 x 3 conv, 256 3 x 3 conv, 256 , pool/ 2 3 x 3 conv, 512 3 x 3 conv, 512 3 x 3 conv, 512 3 x 3 conv, 512 , pool/ 2 3 x 3 conv, 512 3 x 3 conv, 512 3 x 3 conv, 512 3 x 3 conv, 512 , pool/ 2 fc, 4096 fc, 4096 fc, 1000 VGG, 19 layers (ILSVRC 2014) input 7 x7^ Co +n v 2 (S) Ma 3 x3x +P oo 2 (Sl) LocalRespNorm 1 x1^ Co +n 1 v (V) 3 x3^ Co +n v 1 (S) LocalRespNorm Ma 3 x3x +P oo 2 (Sl) 1 x1^ Co +n 1 v (S) 3 x3Co +n v 1 (S) 5 x5Co +n 1 v (S) 1 x1Co +n v 1 (S) 1 x1^ Co +n v 1 (S) 1 x1Co +n 1 v (S) 3 x3Max +P oo 1 (lS) Dept hConcat 1 x1^ Co +n 1 v (S) 3 x3Co +n v 1 (S) 5 x5Co +n 1 v (S) 1 x1Co +n v 1 (S) 1 x1^ Co +n v 1 (S) 1 x1Co +n 1 v (S) 3 x3Max +P oo 1 (lS) Dept hConcat Ma 3 x3x +P oo 2 (Sl) 1 x1^ Co +n 1 v (S) 3 x3Co +n v 1 (S) 5 x5Co +n 1 v (S) 1 x1Co +n v 1 (S) 1 x1^ Co +n v 1 (S) 1 x1Co +n 1 v (S) 3 x3Max +P oo 1 (lS) Dept hConcat 1 x1^ Co +n v 1 (S) 3 x3Co n+ 1 v (^ S) 5 x5Co +n v 1 (S) 1 x1Co n+v 1 (S) 1 x1^ Co +n 1 v (^ S) 1 x1Co +n v 1 (S) 3 x3Max P+oo 1 (Sl)^ Av 5 x5^ erage + 3 P oo(V)l Dept hConcat 1 x1^ Co +n v 1 (S) 3 x3Co n+ 1 v (^ S) 5 x5Co +n v 1 (S) 1 x1Co n+v 1 (S) 1 x1^ Co n+ 1 v (^ S) 1 x1Co +n v 1 (S) 3 x3Max P+oo 1 (Sl) Dept hConcat 1 x1^ Co +n v 1 (S) 3 x3Co n+ 1 v (^ S) 5 x5Co +n v 1 (S) 1 x1Co n+v 1 (S) 1 x1^ Co n+ 1 v (^ S) 1 x1Co +n v 1 (S) 3 x3Max P+oo 1 (Sl) Dept hConcat 1 x1^ Co n+ 1 v^ (S) 3 x3Co +n 1 v (S) 5 x5Co n+v 1 (S) 1 x1Co +n 1 v (S) 1 x1^ Co +n 1 v (S) 1 x1Co +n v 1 (S) 3 x3Max +P 1 oo (Sl)^ Av 5 x5^ erage + 3 P (ooV)l Dept hConcat 3 x3^ Max P+oo 2 (lS) 1 x1^ Co +nv 1 (S) 3 x3Co +n 1 v (S) 5 x5Co +nv 1 (S) 1 x1Co +n 1 v (S) 1 x1^ Co +n 1 v (S) 1 x1Co +n v 1 (S) 3 x3Max +P 1 oo (Sl) Dept hConcat 1 x1^ Co +nv 1 (S) 3 x3Co +n 1 v (S) 5 x5Co +nv 1 (S) 1 x1Co +n 1 v (S) 1 x1^ Co +n 1 v (S) 1 x1Co +n v 1 (S) 3 x3Max +P 1 oo (Sl) Dept hConcat Av 7 x7 erage + 1 P (ooV)l FC 1 x1^ Co +n v 1 (S) FC FC Soft maxAct iv at ion soft max 1 x1^ Co +n v 1 (S) FC FC Soft maxAct iv at ion soft max Soft maxAct iv at ion soft max GoogleNet, 22 layers (ILSVRC 2014) Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016.
34 58 66 86 HOG, DPM AlexNet (RCNN) VGG (RCNN) ResNet (Faster RCNN)* PASCAL VOC 2007 Object Detection mAP (%) shallow 8 layers 16 layers 101 layers *w/ other improvements & more data Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016. Engines of visual recognition
ResNet’s object detection result on COCO *the original image is from the COCO dataset Kaiming He, Xiangyu Zhang, Shaoqing Ren, & Jian Sun. “Deep Residual Learning for Image Recognition”. CVPR 2016.
edges classifier “bus”? pixels classifier “bus”? edges^ histogram^ classifier “bus”? SIFT/HOG edges histogram classifier “bus”? K-means/ sparse code shallower deeper But what’s next?
shallower deeper 5 layers: easy
10 layers: initialization, Batch Normalization 30 layers: skip connections 100 layers: identity skip connections 1000 layers:?
LeCun et al 1998 “Efficient Backprop” Glorot & Bengio 2010 “Understanding the difficulty of training deep feedforward neural networks” input 𝑋 output 𝑌 = 𝑊𝑋 weight 𝑊 1 - layer: 𝑉𝑎𝑟 𝑦 = (𝑛 +, 𝑉𝑎𝑟 𝑤 )𝑉𝑎𝑟[𝑥] Multi-layer: 𝑉𝑎𝑟 𝑦 = ( 2 𝑛 3 +, 𝑉𝑎𝑟 𝑤 3 3 )𝑉𝑎𝑟[𝑥] If:
? (healthy forward) and ∏ 𝑛 3 567 𝑉𝑎𝑟 𝑤 3 3 = 𝑐𝑜𝑛𝑠𝑡 @? (healthy backward) 𝑛 3 +, 𝑉𝑎𝑟 𝑤 3 = 1 or* 𝑛 3 567 𝑉𝑎𝑟 𝑤 3 = 1 *: 𝑛 3 567 = 𝑛 3 BC +, , so D5,E7FG D5,E7HG = ,IJKL MNO , HPQKL RS <^ ∞. It is sufficient to use either form. “ Xavier ” init in Caffe