
























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
An in-depth exploration of subgradients, a fundamental concept in convex optimization. It covers the definition and properties of subgradients, including strong and weak subgradient calculus, optimality conditions via subgradients, and the relationship between subgradients and directional derivatives. The document delves into various examples and applications, such as piecewise linear minimization, constrained optimization, and the connection between subgradients and descent directions. It serves as a comprehensive resource for understanding the role of subgradients in convex analysis and optimization, particularly in the context of nondifferentiable functions. The content is drawn from the ee364b course at stanford university, providing a rigorous and insightful treatment of this important topic.
Typology: Summaries
1 / 32
This page cannot be seen from the preview
Don't miss anything!

























subgradients
-^
strong and weak subgradient calculus
-^
optimality conditions via subgradients
-^
directional derivatives
EE364b, Stanford University
recall basic inequality for convex differentiable
f
f^ (
y)
f
(x
f^
(x
(y
x
first-order approximation of
f
at
x
is global underestimator
f^ (
x)
supports
epi
f
at
x, f
(x
what if
f
is not differentiable?
EE364b, Stanford University
g^
is a subgradient of
f
at
x
iff
g,
supports
epi
f
at
x, f
(x
g^
is a subgradient iff
f
(x
g
T^
(y
x
is a global (affine)
underestimator of
f
if^
f^
is convex and differentiable,
f^
(x
is a subgradient of
f
at
x
subgradients come up in several contexts:^ •
algorithms for nondifferentiable convex optimization • convex analysis,
e.g.
, optimality conditions, duality for nondifferentiable
problems (if
f
(y
f
(x
g
T^
(y
x
for all
y
, then
g
is a
supergradient
EE364b, Stanford University
f^
= max
{f
, f 1
, with
f
f
2
convex and differentiable
x^0
f^1
(x
)
f^2
(x
)
f^ (
x)
f^1
(x
f
x^0
): unique subgradient
g
f^1
(x
f^2
(x
f
x^0
): unique subgradient
g
f^2
(x
f^1
(x
f
x^0
): subgradients form a line segment
f^1
(x
f^2
(x
EE364b, Stanford University
f^ (
x) =
|x
f^ (
x) =
|x
|^
∂f
(x
)
x
x
1
−
1
righthand plot shows
(x, g
|^ x
, g
∂f
(x
EE364b, Stanford University
weak subgradient calculus
: formulas for finding
one
subgradient
g^
∂f
(x
strong subgradient calculus
: formulas for finding the whole
subdifferential
∂f
(x
i.e.
all
subgradients of
f
at
x
many algorithms for nondifferentiable convex optimization require only one
subgradient at each step, so weak calculus suffices
-^
some algorithms, optimality conditions, etc., need whole subdifferential
-^
roughly speaking: if you can compute
f
(x
), you can usually compute a
g^
∂f
(x
we’ll assume that
f
is convex, and
x
relint dom
f
EE364b, Stanford University
f^ (
x) = max
{f
x)
,... , f
m
(x
, with
f
,... , f 1
m
differentiable
∂f
(x
Co
fi
(x
|^ f
(i x) =
f
(x
example:
f
(x
x‖
1
= max
{s
T^ x
si
1 1
−
1
−
1
∂f
(x
)^
at
x
= (
,^ 0)
1
1 −
1 at
x
= (
,^ 0)
(1,1)
at
x
= (
,^ 1)
EE364b, Stanford University
if^
f^
= sup
α∈A
fα
cl Co
{∂f
(β x)
fβ
(x
f
(x
∂f
(x
(usually get equality, but requires some technical conditions to hold,
e.g.
compact,
f
α^
cts in
x
and
α
roughly speaking,
∂f
(x
is closure of convex hull of union of
subdifferentials of active functions EE364b, Stanford University
example
f^ (
x) =
λ
max
(x
sup ‖y‖
=1 2
T y
(x
)y
where
(x
0
x
1
x
An
,n
∈i
k
f^
is pointwise supremum of
g
(y x) =
y
T^
(x
)y
over
y‖
2
gy
is affine in
x
, with
gy
(x
Ty
y,... , y 1
T^
yn
hence,
∂f
(x
Co
gy
(x
)y
λ
max
(x
y,
y‖
2
(in fact equality holds here) to find
one
subgradient at
x
, can choose
any
unit eigenvector
y
associated
with
λ
max
(x
; then
(y
y,... , y 1
T^
yn
∂f
(x
EE364b, Stanford University
f^ (
x) =
f
(x, ω
), with
f
convex in
x
for each
ω
,^ ω
a random variable
for each
ω
, choose
any
g
ω^
(f x, ω
(so
ω
g
ω^
is a function)
then,
g
g
ω^
∂f
(x
Monte Carlo method for (approximately) computing
f
(x
and
a
g
∂f
(x
generate independent samples
ω
,... , ω 1
K
from distribution of
ω
f^ (
x)
Ki=
f (x, ω
)i
for each
i
choose
g
∈i
fx (x, ω
)i
g^
Ki=
g i^
is an (approximate) subgradient
(more on this later) EE364b, Stanford University
f^ (
x) =
h
(f
x)
,... , f
(k x))
, with
h
convex nondecreasing,
f
i^
convex
find
q
∂h
(f
x)
,... , f
(k x))
,^ g
∈i
∂f
(i x)
then,
g
q
g 1 1
q
gk k^
∂f
(x
reduces to standard formula for differentiable
h
,^ f
i
proof:
f^
(y
h(
f^1
(y
),... , f
(k y))
h(
f^1
(x
g
T 1
(y
x
),... , f
(k x) +
g
T(k^
y^
x
h(
f^1
(x
),... , f
(k x)) +
q
Tg 1
y^
x
),... , g
Tk^
(y
x
f^ (
x) +
g
y^
x
EE364b, Stanford University
g^
is a subgradient at
x
means
f
(y
f
(x
g
T^
(y
x
hence
f
(y
f
(x
g
T^
(y
x
f^ (
x)
≤
f
(x
x ) 0 0
g^
∈
∂f
(x
) 0
x^1 ∇f
(x
) 1
EE364b, Stanford University
g^
is a
quasigradient
of
f
at
x
if
T g (y
x
f
(y
f
(x
holds for all
y
g
x
f^ (
y)
≤
f
(x
)
quasigradients at
x
form a cone
EE364b, Stanford University
example:
f^ (
x) =
a T^ x
b
T c x
d
(dom
f
x^
|^ c
T^ x
d >
g^
a
f
(x
c^
is a quasigradient at
x
0
proof: for
c
T^ x
d >
T a (x
x
f
(x
Tc (x
x
f
(x
f
(x
EE364b, Stanford University