






















































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Lecture slides for the Advanced Machine Learning course taught by John Cunningham at Columbia University. The course covers topics such as computer vision, reinforcement learning, natural language processing, and neural networks. The slides also discuss the administrative reminders, catalysts for data, computational power, and software. an overview of the course content and the challenges associated with deep learning.
Typology: Lecture notes
1 / 94
This page cannot be seen from the preview
Don't miss anything!























































































John Cunningham
Department of Statistics Columbia University
ADMINISTRATIVE REMINDERS
Welcome! Let’s discuss the syllabus...
BUT WHAT ABOUT ALL THE AI HYPE?
Modern AI/ML is the same recipe
The four catalysts of the AI explosion
We will study some neural network families F. While neural networks are powerful, there is nothing magical or fundamentally different than what you already know.
CATALYST 1: DATA
Computer Vision
SVHN CIFAR10 ImageNet ...
Reinforcement Learning
OpenAI Breakout OpenAI Cartpole UCB Pacman ...
Natural Language Processing
Wikipedia (English) Twitter Jeopardy ...
And so much more...
CATALYST 3: NEURAL NETWORKS
σ^ P iw^0 ijx^0 i +b^1 j^
x^03
x^02
x^01
...
...
...
Neural unit
w^03 j
w^02 j
w^01 j x^1 j
x^01
x^02
x^03
x^11
x^12
x^13
x^24
x^21
x^22
x^23
x^24
x^31
Input layer Hidden layer 1 Hidden layer 2 Output layer
With enough layers and enough units per layer, the network is a universal function approximator: any function can be fit (given enough data...).
Learning/Training is then minimizing an empirical risk over the parameter set
θ =
n wℓ ij, bℓ j
o
i,j,ℓ
= {Wℓ, bℓ}ℓ
EXAMPLE: LOGISTIC REGRESSION → NEURAL NETWORKS
Logistic Regression
x W b^ fθ (x) σ(Wx + b)
Neural Network
x
W 1 b 1 f^ θ( 1 )(x) σ(W 1 x + b 1 )
W 2 b 2 f^ θ( 2 )(x) σ(W 2 f (^1 )(x) + b 2 )
...DEEP LEARNING IS HARD
ex^ −e−x ex^ −e−x^ max(^0 ,^ x)^ log (^1 +^ e
x) Pexi k exk^ ...
Deep learning requires engineering skill, statistical thinking, and thoughtful empiricism.
CATALYST 4: SOFTWARE
Machine Learning libraries have abstracted {math, stats, optimization, ...} → engineering
...
Under the hood are several essential elements to understand:
(sounds obvious, but we’ll spend some time here...)
(much more to come here also...)
To understand modern ML, we need to understand why these work... and when they don’t.
ADMINISTRATIVE REMINDERS
A MOST IMPORTANT REMINDER
A neural network represents a function fθ : Rd^1 → Rd^2.
READING NEURAL NETWORKS
f : R^3 → R^3 with input x =
x 1 x 2 x 3
w 11
w 12
w 13 w 21 w 22
w 23 w 31
w 32
w 33
f 1 (x) = ϕ 1 (⟨w 1 , x⟩) f 2 (x) = ϕ 2 (⟨w 2 , x⟩) f 3 (x) = ϕ 3 (⟨w 3 , x⟩)
x 1 x 2 x 3
ϕ 1 ϕ 2 ϕ 3
f (x) =
f 1 (x) f 2 (x) f 3 (x)
(^) with fi(x) = ϕi
j= 1
wjixj
(recall inner product ⟨wi, x⟩ = w⊤ i x = P j wjixj )
FEED-FORWARD NETWORKS
A feed-forward network is a neural network whose units can be arranged into groups L 1 ,... , LK so that connections (arrows) only pass from units in group Lk to units in group Lk+ 1. The groups are called layers. In a feed-forward network:
feed-forward
L 1
L 2
L 3
not feed-forward not feed-forward (but still useful...)
LAYERS
w^111
w^112 w 1 (^21) w 1 22
ϕ^11 ϕ^12 f (^2 )
f (^2 )(y) =
ϕ^11 ( w^11 , y ) ϕ^12 ( w^12 , y )
COMPOSITION OF FUNCTIONS
Basic composition Suppose f and g are two function R → R. Their composition g ◦ f is the function
g ◦ f (x) := g(f (x)).
For example: f (x) = x + 1 g(y) = y^2 g ◦ f (x) = (x + 1 )^2 We could combine the same functions the other way around:
f ◦ g(x) = x^2 + 1
In multiple dimensions Suppose f : Rd^1 → Rd^2 and g : Rd^2 → Rd^3. Then
g ◦ f (x) = g(f (x)) is a function Rd^1 → Rd^3.
For example: f (x) = ⟨x, v⟩ − c g(y) = sgn(y) g ◦ f (x) = sgn(⟨x, v⟩ − c)