Lecture 8: Optimization Theory - Composite Functions and Arbitrary Norms, Exams of Optimization Techniques in Engineering

A part of the lecture notes for the course 'Optimization Theory' (MS&E 213 / CS 2690) taught by Aaron Sidford at the University of California, Berkeley. The eighth lecture covers composite functions and arbitrary norms in optimization theory. It discusses the extension of optimization methods to various settings and oracles, including different norms, constraints, and composite functions.

Typology: Exams

2021/2022

Uploaded on 07/05/2022

carol_78
carol_78 ๐Ÿ‡ฆ๐Ÿ‡บ

4.8

(59)

1K documents

1 / 16

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Introduction to
Optimization Theory
Lecture #8 -10/8/20
MS&E 213 / CS 2690
Aaron Sidford
sidford@stanford.edu
โ„
๐‘“โ„
๐‘“โˆ—
๐‘ฅโˆ—
๐‘“
1
1
0
0
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Lecture 8: Optimization Theory - Composite Functions and Arbitrary Norms and more Exams Optimization Techniques in Engineering in PDF only on Docsity!

Introduction to

Optimization Theory

Lecture #8 - 10/8/

MS&E 213 / CS 2690

Aaron Sidford

[email protected]

Plan for Today Recap โ€ข^ Iterative methods Extension #1 โ€ข^ General norms

  • Composite functions
  • More Extensions Tuesday

Plan for Today

Recap โ€ข^ Iterative methods Extension #1 โ€ข^ General norms

รผ

  • Composite functions
  • More Extensions Tuesday

Extensions Iterative Method Landscape

  • So far โ€“ first order methods (gradient / value oracle) and โ€– โ‹… โ€–!
  • Our machinery extends to many different settings and oracles
  • Goal : see broader theory and understand extensions Casess
  • Different norms (e.g. โ€– โ‹… โ€–")
  • Constraints, e.g. min

๐‘“(๐‘ฅ)

  • Composite functions, e.g. min

๐‘“ ๐‘ฅ + ๐‘ฅ (^) &

  • Coordinate descent smooth simple

Example Proof Lemma: min

๐‘ฅ

๐‘ฆ +

๐‘ฆ

= โˆ’

๐‘ฅ (^) โˆ—

Proof :

  • LHS = โˆ’ max

โˆ’๐‘ฅ

๐‘ฆ โˆ’

๐‘ฆ

  • = โˆ’ max

โˆ’๐‘ฅ

๐›ฝ โ‹… ๐‘ง โˆ’

๐›ฝ โ‹… ๐‘ง

  • = โˆ’ max

๐›ฝ โ‹… โˆ’๐‘ฅ

๐‘ง โˆ’

"

  • = โˆ’ max

๐›ฝ โ‹… โ€– โˆ’ ๐‘ฅโ€–โˆ— โˆ’

๐‘ฅ (^) โˆ— = max ) *+

Same as ๐‘ง = 1 since can always increase argument without decreasing objective. Maximizing ๐›ฝ = (^2) โˆ— 1

Arbitrary Norms

  • Definition : ๐‘“: โ„

โ†’ โ„ is ๐ฟ - smooth with respect to โ‹… if and only if โˆ‡๐‘“ ๐‘ฅ โˆ’ โˆ‡๐‘“ ๐‘ฆ (^) โˆ— โ‰ค ๐ฟ ๐‘ฅ โˆ’ ๐‘ฆ for all ๐‘ฅ, ๐‘ฆ โˆˆ โ„

  • Definition : ๐‘“: โ„

โ†’ โ„ is ๐œ‡-strongly with respect to โ€– โ‹… โ€– if and only if ๐‘“ ๐‘ก โ‹… ๐‘ฆ + 1 โˆ’ ๐‘ก โ‹… ๐‘ฅ โ‰ค ๐‘ก โ‹… ๐‘“ ๐‘ฆ + 1 โˆ’ ๐‘ก ๐‘“ ๐‘ฅ โˆ’ ๐œ‡ 2 ๐‘ก 1 โˆ’ ๐‘ก ๐‘ฅ โˆ’ ๐‘ฆ

Why? ๐‘‚^

๐ฟ ๐‘ฅ โˆ’ ๐‘ฅโˆ— %^ %
๐ฟ ๐‘ฅ โˆ’ ๐‘ฅโˆ— "^ %

Can mean a ๐‘‚(๐‘›) step improvement as ๐‘ฅ (^) " โ‰ค ๐‘ฅ (^) % โ‰ค ๐‘› ๐‘ฅ (^) "

Equivalence? Lemma : If ๐‘“: โ„#^ โ†’ โ„ is convex and differentiable with ๐‘“ ๐‘ฅ โ‰ค ๐‘“ ๐‘ฆ + โˆ‡๐‘“ ๐‘ฆ , ๐‘ฅ โˆ’ ๐‘ฆ +

% then ๐‘“ is ๐ฟ-smooth, i.e. โˆ‡๐‘“ ๐‘ฅ โˆ’ โˆ‡๐‘“ ๐‘ฆ (^) โˆ— โ‰ค ๐ฟ ๐‘ฅ โˆ’ ๐‘ฆ. Proof:

  • Let ๐‘” ๐‘ง = ๐‘“ ๐‘ง โˆ’ [๐‘“ ๐‘ฅ + โˆ‡๐‘“ ๐‘ฅ ,^ ๐‘ง โˆ’ ๐‘ฅ ]
  • ๐‘” is convex and โˆ‡๐‘” ๐‘ฅ = 0
  • 0 = ๐‘” ๐‘ฅ = min )

, ๐‘ง โˆ’ ๐‘ฆ + 3 %

% ] โˆ’ [๐‘“(๐‘ฅ) + โˆ‡๐‘“ ๐‘ฅ , ๐‘ง โˆ’ ๐‘ฅ ]

  • ๐‘“ ๐‘ฆ โ‰ค ๐‘“ ๐‘ฅ + โˆ‡๐‘“ ๐‘ฅ , ๐‘ฆ โˆ’ ๐‘ฅ + 3 %

%

  • 0 โ‰ค min )

, ๐‘ง โˆ’ ๐‘ฆ + 3 %

%

3 %

%

  • = โˆ’

%

%

3 %

% min 0

๐‘ฆ %^ = โˆ’
๐‘ฆ โˆ—^ %

More Equivalences Lemma : ๐‘“: โ„

โ†’ โ„ is ๐ฟ - smooth and ๐œ‡-strongly convex with respect to โ€– โ‹… โ€– if and only if for all ๐‘ฅ, ๐‘ฆ โˆˆ โ„

๐œ‡ 2 ๐‘ฅ โˆ’ ๐‘ฆ

โ‰ค ๐‘“ ๐‘ฆ โˆ’ ๐‘“ ๐‘ฅ + โˆ‡๐‘“ ๐‘ฅ

๐‘ฆ โˆ’ ๐‘ฅ โ‰ค ๐ฟ 2 ๐‘ฅ โˆ’ ๐‘ฆ

Lemma : twice differentiable ๐‘“: โ„

โ†’ โ„ is ๐ฟ-smooth and ๐œ‡-strongly convex with respect to โ€– โ‹… โ€– if and only if for all ๐‘ฅ, ๐‘ง โˆˆ โ„

๐œ‡ ๐‘ง

โ‰ค ๐‘ง

โˆ‡

๐‘“ ๐‘ฅ ๐‘ง โ‰ค ๐ฟ ๐‘ง

Analysis Upper Bound Oracle!

  • ๐‘ฅ 012 = argmin

๐‘“ ๐‘ฅ 0 + โˆ‡๐‘“ ๐‘ฅ 0

๐‘ฅ โˆ’ ๐‘ฅ 0 +

๐‘ฅ โˆ’ ๐‘ฅ 0

  • โ‡’ ๐‘“ ๐‘ฅ 012 โ‰ค ๐‘“ ๐‘ฅ 0 โˆ’

โˆ‡๐‘“ ๐‘ฅ 0 โˆ—

Lemma

โˆ‡๐‘“ ๐‘ฅ (^) โˆ—

โ‰ค ๐‘“ ๐‘ฅ โˆ’ ๐‘“โˆ— โ‰ค

๐‘ฅ โˆ’ ๐‘ฅโˆ—

โˆ‡๐‘“ ๐‘ฅ (^) โˆ—

โ‰ฅ ๐‘“ ๐‘ฅ โˆ’ ๐‘“โˆ— โ‰ค

๐‘ฅ โˆ’ ๐‘ฅโˆ—

% โ‰ค ๐‘“ ๐‘ฆ โˆ’ ๐‘“ ๐‘ฅ + โˆ‡๐‘“ ๐‘ฅ , ๐‘ฆ โˆ’ ๐‘ฅ โ‰ค

%

Theorem : Gradient descent computes

๐œ–-optimal point with

log

, !< (,โˆ—

gradient queries

Acceleration? Depends on norm!

Next extension!

Plan for Today

Recap โ€ข^ Iterative methods Extension #1 โ€ข^ General norms

รผ

รผ

  • Composite functions
  • More Extensions Tuesday

Composite Function Minimization Problem min

3 โˆˆโ„!^

๐‘“(๐‘ฅ) where ๐‘“ ๐‘ฅ = ๐‘” ๐‘ฅ + ๐œ“(๐‘ฅ)

  • ๐‘”: โ„

โ†’ โ„ is ๐ฟ-smooth with respect to โ€– โ‹… โ€– and convex

  • ๐œ“: โ„

โ†’ โ„ is โ€œgiven / simpleโ€ (TBD)

  • ๐‘“: โ„ โ†’ โ„ is ๐œ‡-strongly convex with respect to โ€– โ‹… โ€– Question
  • How to optimize?
  • Note: ๐‘“ may not be smooth! May not be differentiable!
    • e.g. ๐‘“ ๐‘ฅ = ๐‘” ๐‘ฅ + ๐œ† ๐‘ฅ (^2)