Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Lecture 8: Optimization Theory - Composite Functions and Arbitrary Norms, Exams of Optimization Techniques in Engineering

International College of Management, Sydney (ICMS)Optimization Techniques in Engineering

A part of the lecture notes for the course 'Optimization Theory' (MS&E 213 / CS 2690) taught by Aaron Sidford at the University of California, Berkeley. The eighth lecture covers composite functions and arbitrary norms in optimization theory. It discusses the extension of optimization methods to various settings and oracles, including different norms, constraints, and composite functions.

Typology: Exams

2021/2022

Uploaded on 07/05/2022

carol_78 🇦🇺

4.8

(59)

1K documents

1 / 16

This page cannot be seen from the preview

Don't miss anything!

Introduction to

Optimization Theory

Lecture #8 -10/8/20

MS&E 213 / CS 2690

Aaron Sidford

sidford@stanford.edu

ℝ

𝑓ℝ

𝑓∗

𝑥∗

𝑓

Discover Exams of Optimization Techniques in Engineering International College of Management, Sydney (ICMS)

Partial preview of the text

Download Lecture 8: Optimization Theory - Composite Functions and Arbitrary Norms and more Exams Optimization Techniques in Engineering in PDF only on Docsity!

Introduction to

Optimization Theory

Lecture #8 - 10/8/

MS&E 213 / CS 2690

Aaron Sidford

[email protected]

Plan for Today Recap •^ Iterative methods Extension #1 •^ General norms

Composite functions
More Extensions Tuesday

Plan for Today

Recap •^ Iterative methods Extension #1 •^ General norms

ü

Composite functions
More Extensions Tuesday

Extensions Iterative Method Landscape

So far – first order methods (gradient / value oracle) and ‖ ⋅ ‖!
Our machinery extends to many different settings and oracles
Goal : see broader theory and understand extensions Casess
Different norms (e.g. ‖ ⋅ ‖")
Constraints, e.g. min

𝑓(𝑥)

Composite functions, e.g. min

𝑓 𝑥 + 𝑥 (^) &

Coordinate descent smooth simple

Example Proof Lemma: min

𝑥

𝑦 +

𝑦

= −

𝑥 (^) ∗

Proof :

LHS = − max

−𝑥

𝑦 −

𝑦

= − max

−𝑥

𝛽 ⋅ 𝑧 −

𝛽 ⋅ 𝑧

= − max

𝛽 ⋅ −𝑥

𝑧 −

= − max

𝛽 ⋅ ‖ − 𝑥‖∗ −

𝑥 (^) ∗ = max ) *+

Same as 𝑧 = 1 since can always increase argument without decreasing objective. Maximizing 𝛽 = (^2) ∗ 1

Arbitrary Norms

Definition : 𝑓: ℝ

→ ℝ is 𝐿 - smooth with respect to ⋅ if and only if ∇𝑓 𝑥 − ∇𝑓 𝑦 (^) ∗ ≤ 𝐿 𝑥 − 𝑦 for all 𝑥, 𝑦 ∈ ℝ

Definition : 𝑓: ℝ

→ ℝ is 𝜇-strongly with respect to ‖ ⋅ ‖ if and only if 𝑓 𝑡 ⋅ 𝑦 + 1 − 𝑡 ⋅ 𝑥 ≤ 𝑡 ⋅ 𝑓 𝑦 + 1 − 𝑡 𝑓 𝑥 − 𝜇 2 𝑡 1 − 𝑡 𝑥 − 𝑦

Why? 𝑂^

𝐿 𝑥 − 𝑥∗ %^ %

𝐿 𝑥 − 𝑥∗ "^ %

Can mean a 𝑂(𝑛) step improvement as 𝑥 (^) " ≤ 𝑥 (^) % ≤ 𝑛 𝑥 (^) "

Equivalence? Lemma : If 𝑓: ℝ#^ → ℝ is convex and differentiable with 𝑓 𝑥 ≤ 𝑓 𝑦 + ∇𝑓 𝑦 , 𝑥 − 𝑦 +

% then 𝑓 is 𝐿-smooth, i.e. ∇𝑓 𝑥 − ∇𝑓 𝑦 (^) ∗ ≤ 𝐿 𝑥 − 𝑦. Proof:

Let 𝑔 𝑧 = 𝑓 𝑧 − [𝑓 𝑥 + ∇𝑓 𝑥 ,^ 𝑧 − 𝑥 ]
𝑔 is convex and ∇𝑔 𝑥 = 0
0 = 𝑔 𝑥 = min )

, 𝑧 − 𝑦 + 3 %

% ] − [𝑓(𝑥) + ∇𝑓 𝑥 , 𝑧 − 𝑥 ]

𝑓 𝑦 ≤ 𝑓 𝑥 + ∇𝑓 𝑥 , 𝑦 − 𝑥 + 3 %

0 ≤ min )

, 𝑧 − 𝑦 + 3 %

3 %

= −

3 %

% min 0

𝑦 %^ = −

𝑦 ∗^ %

More Equivalences Lemma : 𝑓: ℝ

→ ℝ is 𝐿 - smooth and 𝜇-strongly convex with respect to ‖ ⋅ ‖ if and only if for all 𝑥, 𝑦 ∈ ℝ

𝜇 2 𝑥 − 𝑦

≤ 𝑓 𝑦 − 𝑓 𝑥 + ∇𝑓 𝑥

𝑦 − 𝑥 ≤ 𝐿 2 𝑥 − 𝑦

Lemma : twice differentiable 𝑓: ℝ

→ ℝ is 𝐿-smooth and 𝜇-strongly convex with respect to ‖ ⋅ ‖ if and only if for all 𝑥, 𝑧 ∈ ℝ

𝜇 𝑧

≤ 𝑧

∇

𝑓 𝑥 𝑧 ≤ 𝐿 𝑧

Analysis Upper Bound Oracle!

𝑥 012 = argmin

𝑓 𝑥 0 + ∇𝑓 𝑥 0

𝑥 − 𝑥 0 +

𝑥 − 𝑥 0

⇒ 𝑓 𝑥 012 ≤ 𝑓 𝑥 0 −

∇𝑓 𝑥 0 ∗

Lemma

∇𝑓 𝑥 (^) ∗

≤ 𝑓 𝑥 − 𝑓∗ ≤

𝑥 − 𝑥∗

∇𝑓 𝑥 (^) ∗

≥ 𝑓 𝑥 − 𝑓∗ ≤

𝑥 − 𝑥∗

% ≤ 𝑓 𝑦 − 𝑓 𝑥 + ∇𝑓 𝑥 , 𝑦 − 𝑥 ≤

Theorem : Gradient descent computes

𝜖-optimal point with

log

, !< (,∗

gradient queries

Acceleration? Depends on norm!

Next extension!

Plan for Today

Recap •^ Iterative methods Extension #1 •^ General norms

ü

Composite functions
More Extensions Tuesday

Composite Function Minimization Problem min

3 ∈ℝ!^

𝑓(𝑥) where 𝑓 𝑥 = 𝑔 𝑥 + 𝜓(𝑥)

𝑔: ℝ

→ ℝ is 𝐿-smooth with respect to ‖ ⋅ ‖ and convex

𝜓: ℝ

→ ℝ is “given / simple” (TBD)

𝑓: ℝ → ℝ is 𝜇-strongly convex with respect to ‖ ⋅ ‖ Question
How to optimize?
Note: 𝑓 may not be smooth! May not be differentiable!
- e.g. 𝑓 𝑥 = 𝑔 𝑥 + 𝜆 𝑥 (^2)

Lecture 8: Optimization Theory - Composite Functions and Arbitrary Norms, Exams of Optimization Techniques in Engineering

Related documents

Partial preview of the text

Download Lecture 8: Optimization Theory - Composite Functions and Arbitrary Norms and more Exams Optimization Techniques in Engineering in PDF only on Docsity!

Introduction to

Optimization Theory

Lecture #8 - 10/8/

MS&E 213 / CS 2690

Aaron Sidford

[email protected]

Plan for Today

ü

𝐿 𝑥 − 𝑥∗ %^ %

𝐿 𝑥 − 𝑥∗ "^ %

𝑦 %^ = −

𝑦 ∗^ %

Theorem : Gradient descent computes

𝜖-optimal point with

log

gradient queries

Plan for Today

ü

ü

3 ∈ℝ!^