Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Advanced Machine Learning Lecture Slides Weeks 1-3, Lecture notes of Machine Learning

Columbia College Machine Learning

Lecture slides for the Advanced Machine Learning course taught by John Cunningham at Columbia University. The course covers topics such as computer vision, reinforcement learning, natural language processing, and neural networks. The slides also discuss the administrative reminders, catalysts for data, computational power, and software. an overview of the course content and the challenges associated with deep learning.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

eknath 🇺🇸

4.7

(29)

266 documents

1 / 94

This page cannot be seen from the preview

Don't miss anything!

STAT GR5242: Advanced Machine Learning

Lecture slides: Weeks 1-3

John Cunningham

Department of Statistics

Columbia University

Advanced Machine Learning 1 / 94

Discover Lecture notes of Machine Learning Columbia College

Partial preview of the text

Download Advanced Machine Learning Lecture Slides Weeks 1-3 and more Lecture notes Machine Learning in PDF only on Docsity!

STAT GR5242: Advanced Machine Learning

Lecture slides: Weeks 1-

John Cunningham

Department of Statistics Columbia University

ADMINISTRATIVE REMINDERS

Welcome! Let’s discuss the syllabus...

BUT WHAT ABOUT ALL THE AI HYPE?

Modern AI/ML is the same recipe

(^) Gather data, choose F = {fθ : θ ∈ Θ}, specify loss, minimize empirical risk
(^) All the same potential issues exist (wrong F, under/overfitting, optimization issues,...)
(^) The same statistical and computational thinking is necessary

The four catalysts of the AI explosion

Large and readily available datasets
Massive and cheap computational power
Flexible and general function families F
Open-source ML software libraries with powerful abstractions

We will study some neural network families F. While neural networks are powerful, there is nothing magical or fundamentally different than what you already know.

CATALYST 1: DATA

Computer Vision

SVHN CIFAR10 ImageNet ...

Reinforcement Learning

OpenAI Breakout OpenAI Cartpole UCB Pacman ...

Natural Language Processing

Wikipedia (English) Twitter Jeopardy ...

And so much more...

(^) https://www.data.gov/
(^) https://opendata.cityofnewyork.us/
(^) https://github.com/caesar0301/awesome-public-datasets
(^) ...

CATALYST 3: NEURAL NETWORKS

σ^ P iw^0 ijx^0 i +b^1 j^

x^03

x^02

x^01

...

Neural unit

w^03 j

w^02 j

w^01 j x^1 j

x^01

x^02

x^03

x^11

x^12

x^13

x^24

x^21

x^22

x^23

x^24

x^31

Input layer Hidden layer 1 Hidden layer 2 Output layer

With enough layers and enough units per layer, the network is a universal function approximator: any function can be fit (given enough data...).

(^) Inputs x^0 i enter into unit j, weighted by edges w^0 ij, and are summed with bias b^1 j
(^) σ(·) provides elementwise nonlinearity
(^) The result x^1 j is transmitted to layer 2, the next layer

Learning/Training is then minimizing an empirical risk over the parameter set

θ =

n wℓ ij, bℓ j

i,j,ℓ

= {Wℓ, bℓ}ℓ

EXAMPLE: LOGISTIC REGRESSION → NEURAL NETWORKS

Logistic Regression

x W b^ fθ (x) σ(Wx + b)

Neural Network

W 1 b 1 f^ θ( 1 )(x) σ(W 1 x + b 1 )

W 2 b 2 f^ θ( 2 )(x) σ(W 2 f (^1 )(x) + b 2 )

...DEEP LEARNING IS HARD

(^) How do I choose f (^1 )^ , the number of units in the hidden layers?
(^) How do I choose L, the number of layers?
(^) How do I choose the activation function σ(·)? sigmoid tanh relu softplus softmax ... 1 1 +e−x

ex^ −e−x ex^ −e−x^ max(^0 ,^ x)^ log (^1 +^ e

x) Pexi k exk^ ...

(^) Are there other choices to make?
(^) What about overfitting?
(^) Will my optimizer converge?
(^) Is my problem solvable with a particular architecture F?
(^) Can my data be fit by a particular architecture F? MNIST vs. SVHN

Deep learning requires engineering skill, statistical thinking, and thoughtful empiricism.

CATALYST 4: SOFTWARE

Machine Learning libraries have abstracted {math, stats, optimization, ...} → engineering

...

Under the hood are several essential elements to understand:

(^) Neural networks in detail

(sounds obvious, but we’ll spend some time here...)

(^) Automatic differentiation
(^) Stochastic optimization

(much more to come here also...)

To understand modern ML, we need to understand why these work... and when they don’t.

ADMINISTRATIVE REMINDERS

(^) Slides and syllabus on courseworks (and Assignment 1 soon)
(^) A few comments about textbooks:
- (^) There is no textbook for this course... for a good reason.
- (^) When there is a relevant background reading or survey/review, I will note it in class.
- (^) Mathematics for Machine Learning A. Aldo Faisal, Cheng Soon Ong, and Marc Peter Deisenroth
- (^) Probabilistic Machine Learning Kevin P. Murphy
- (^) Deep Learning Aaron Courville, Yoshua Bengio, Ian Goodfellow
- (^) Pattern Recognition and Machine Learning Christopher Bishop
(^) Ask questions in class. Don’t wait until after class and then divide the impact of that question by 100x.
(^) Also, so you don’t think I’m just making stuff up, a DALL-E sample:

A MOST IMPORTANT REMINDER

A neural network represents a function fθ : Rd^1 → Rd^2.

READING NEURAL NETWORKS

f : R^3 → R^3 with input x =

x 1 x 2 x 3

w 11

w 12

w 13 w 21 w 22

w 23 w 31

w 32

w 33

f 1 (x) = ϕ 1 (⟨w 1 , x⟩) f 2 (x) = ϕ 2 (⟨w 2 , x⟩) f 3 (x) = ϕ 3 (⟨w 3 , x⟩)

x 1 x 2 x 3

ϕ 1 ϕ 2 ϕ 3

f (x) =

f 1 (x) f 2 (x) f 3 (x)

 (^) with fi(x) = ϕi

X^3

j= 1

wjixj

(recall inner product ⟨wi, x⟩ = w⊤ i x = P j wjixj )

FEED-FORWARD NETWORKS

A feed-forward network is a neural network whose units can be arranged into groups L 1 ,... , LK so that connections (arrows) only pass from units in group Lk to units in group Lk+ 1. The groups are called layers. In a feed-forward network:

(^) There are no connections within a layer.
(^) There are no backwards connections.
(^) There are no connections that skip layers, e.g. from Lk to units in group Lk+ 2. (but see Huang...Weinberger 2017 CVPR)

feed-forward

L 1

L 2

L 3

not feed-forward not feed-forward (but still useful...)

LAYERS

w^111

w^112 w 1 (^21) w 1 22

ϕ^11 ϕ^12 f (^2 )

(^) Each layer represents a function, which takes the output values of the previous layers as its arguments.
(^) Suppose the output values of the two nodes at the top are y 1 , y 2.
(^) Then the second layer defines the (two-dimensional) function

f (^2 )(y) =

ϕ^11 ( w^11 , y ) ϕ^12 ( w^12 , y )

COMPOSITION OF FUNCTIONS

Basic composition Suppose f and g are two function R → R. Their composition g ◦ f is the function

g ◦ f (x) := g(f (x)).

For example: f (x) = x + 1 g(y) = y^2 g ◦ f (x) = (x + 1 )^2 We could combine the same functions the other way around:

f ◦ g(x) = x^2 + 1

In multiple dimensions Suppose f : Rd^1 → Rd^2 and g : Rd^2 → Rd^3. Then

g ◦ f (x) = g(f (x)) is a function Rd^1 → Rd^3.

For example: f (x) = ⟨x, v⟩ − c g(y) = sgn(y) g ◦ f (x) = sgn(⟨x, v⟩ − c)

Advanced Machine Learning Lecture Slides Weeks 1-3, Lecture notes of Machine Learning

Related documents

Partial preview of the text

Download Advanced Machine Learning Lecture Slides Weeks 1-3 and more Lecture notes Machine Learning in PDF only on Docsity!

STAT GR5242: Advanced Machine Learning

Lecture slides: Weeks 1-

 X^3

X^3