Optimization Methods and Algorithms, Study notes of Algorithms and Programming

Various optimization methods and algorithms such as Conjugate Direction methods, Gradient Descent, Simplex algorithm, Subgradients, Quasi-Newton Methods, LP Duality, and more. It covers topics such as LPs, standard form, subdifferential, conic sets, and generalized gradient descent. The document also includes mathematical equations and proofs. It could be useful for students studying optimization, linear programming, and related topics.

Typology: Study notes

2021/2022

Uploaded on 05/11/2023

amoda
amoda 🇺🇸

4.1

(13)

257 documents

1 / 2

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
IGrid Search
LPs
Standard Form:
min cTxs.t. Ax =b, x 0, b 0.
Getting it to standard form:
Getting rid of ,:
x14x1+x2= 4, x20
Getting rid of vars:
xRx=uv, u, v R+
Bounded vars:
x[2,5] 2x, x 5.
Simplex algorithm:
(1) Take cost function, turn into min zs.t.
cTx=z, remainder in standard LP form.
(2) Pivoting: do Gaussian Elimination to
get rid of as many variables as possible,
without distributing the zaround.
(3) Variables that have been eliminated ex-
cept in one equation are dependent/basic;
others independent/non-basic. Can always
get a feasible point by setting non-basic
variables to zero, and reading out basic
variables.
h1 0 C
0ImAi[z, xB, xN]T= [z0, b]T
(4) Improve solutions: find smallest reduced
cost Cj. If CJ0, optimality reached,
quit. Else, Jis incoming.
(5) Find as far as we can go by picking out-
going variable:
r= argmini|Ai,j>0bi/Ai,j
(6) Perform elimination to get rid of J, us-
ing equation that makes the outgoing vari-
able a basic one. That is, take the only
equation in which the outgoing variable is
non-zero, and eliminate the incoming vari-
able with it.
(7) Repeat from 4 until optimality reached.
Convex sets,fcns:
Defns:
A set is is Xif for any weighted sum of data
points satisfying Y, the weighted sum is in
the set.
Convex: Piθi= 1, θi0
Affine: Piθi= 1.
Conic: θi0.
Examples:
Lines, line segments, hyperplanes, halfs-
paces, Lpballs for p1, polyhedrons,
polytopes.
Preserving operations:
Translation, scaling, intersection, Affine
functions (e.g., projection, coordinate drop-
ping), set sum {c1+c2|c1C1, c2C2},
direct sum {(c1, c2)|c1C1, c2C2}, per-
spective projection.
Conv. Fcn. Defn:
f(θx + (1 θ)y)θf (x) + (1 θ)f(y)
f(y)f(x) + f(x)T(yx)
Preserving operations, functions:
Non-negative weighted sum, pointwise-
max, affine map f(Ax +b), composition,
perspective map.
Strict, Strong Convexity
Defns:
Strict convexity:
f(θx + (1 θ)y)< θf (x) + (1 θ)f(y) (ba-
sically, not linear).
m-Strong convexity:
f(θx + (1 θ)y)θf (x) + (1 θ)f(y)
1
2(1 θ)||xy||2
2
Better strong convexity defns:
(f(x) f(y))T(xy)m||xy||2
2
f(y)f(x) + f(x)T(yx) + m
2||yx||2
2
2f(x)mI.
Gradient Descent
Given x0, repeat xk=xk1tkf(xk1).
Picking t:can diverge if ttoo big, too slow
if ttoo small.
Backtracing line search: start with t= 1,
while f(xtf(x)) > f(x)αt||∇f(x)||2
2,
update t=βt with 0 < α < 1/2, 0 < β < 1.
Subgradients
Defn.:
Subgradient of convex fis gs.t.
f(y)f(x) + gT(yx)
Subdifferential ∂f (X): set of all g.
SG calculus:
(af) = a∂ f;(f1+f2) = f1+∂f2;
∂f (Ax +b) = AT∂f (Ax +b).
Finite-pointwise max: maxfFf(x) is
the convex hull of the active (achieving
max functions at x).
Norms: if f(x) = ||x||pand 1/p + 1/q = 1,
then ||x||p= max||z||q1zTx; thus
||x||p={y:||y||q1, yTx=
max||z||q1zTx}.
Optimality: f(x) = min f(x)0
∂f (x)
Remember that sgs may not exist for non-
convex functions!
Subgradient Method
Given x0, repeat xk=xk1tkgk1
SG method not descent method; keep track
of best so far.
Picking t:square summable but not
summable (e.g., 1/t). Polyak steps:
(f(xk1)f(x))/||gk1||2
2.
Projected sg method: Project after taking a
step.
Generalized GD
Suppose f(x) = g(x) + h(x) with gconvex,
diff, hconvex, not necessarily diff.
Define proxt(x) = argminz1
2t||xz||2
2+
h(z); GGD is:
xk= proxt(xk1tkg(xk1))
Generalized gradient since if
Gt(x) = (1/t)(xproxt(xtg(x)))
then update is
xk=xk1tkGt(xk1)
With backtracking: While g(xtGt(x)) >
g(x)tg(x)TGt(x) + t
2||Gt(x)||2
2(maybe
with αin last term?) update t=β t.
Example (Lasso): Prox is argminz1
2t||β
z||2
2+λ||z||1=Sλt(β). Sλ(β) is the soft-
threshold operator,
[Sλ(β)]i=(βiλ:βi> λ
0 : λβiλ
βi+λ:βi<λ
Example (Matrix Completion): Objective:
1
2P(i,j) observ(Yi,j Bi,j )2+λ||B||with
||B||=Pr
i=1 σi(B).
Prox function: argminZ1
2t||BZ||2
F+
λ||Z|.
Solution: matrix soft-thresholding;
UΣλVTwhere B=UΣVTand λ)ii =
max{Σii λ, 0}.
Newton’s Method: Originally devel-
oped for finding roots; use it to find roots
of gradient. Want f(x) + 2f(x)∆x= 0;
solution is x=[2f(x)]1f(x).
Damped Newton method:
xk+1 =xkhk[2f(x)]1f(x).
Conjugate Direction methods: Want
to solve min 1
2xTQx bTxwith Q > 0.
Define Q-orthogonality as dT
iQdj= 0.
Exp. subspace thm.:
Let {di}n1
i=0 be Q-conjugate.
(for method) gk=Qxkb
xk+1 =xk+αdk
αk=gT
kdk/(dT
kQdk)
Proof sketch (gkBk) by ind.:
gk+1 =Qxk+1 b=Q(xk+αkdk)b
(Qxkb) + αQdk=gk+αQdk
From here, by defn of α,dT
kgk+1 =
dT
k(gk+αQdk) = dT
kgkαdT
kQdk= 0
Algorithm:
Arbitrary x0, repeat d0=g0=bQx0
αk=gT
kdk/dT
kQdk;xk+1 =xk+αkdk
gk=Qxkb;dk+1 =gk+1 +βkdk
βk=gT
k+1Qdk/(dkQdk)
Quasi-Newton Methods:
Gist: approximate Hessian/inverse Hes-
sian.
Symmetric rank-one correction:
Update: xk+1 =xkαHkgk
αk= argminαf(xkαHkgk) (LS)
gk=fk
Hk+1 =Hk+(pkHkqk)(pkHkqk)T
qT
k(pkHkqk)
pk=xk+1 xk;qk=gk+1 gk
Might not be PSD!
DFP (Rank 2)
Hk+1 =Hk+pkpT
k
pT
kqk
HkqkqT
kHk
qT
kHkqk
BFGS
Update inverse of Hessian via Sherman-
Morrison).
Let qk=gk+1 gk
Hk+1 =Hk+ (1 + qT
kHkqk
pT
kqk
)pkpT
k
pT
kqk
pkqT
kHk+HkqkpT
k
qkpk
LP Duality
Let cn,Am×n,bm,Gr×n,hr.
(P) min cTxs.t.
Ax =b,Gx h
(D) max bTuhTvs.t.
ATuGTv=c,v0.
Duality:
Consider min f(x) s.t.
hi(x)0, i= 1,...,m
lj(x) = 0 j= 1,...,r
Lagrangian:
L(x, u, v) = f(x) + Pm
i=1 uihi(x) +
Pr
j=1 vjlj(x) with uRm,vRrand
u0.
Note: f(x)L(x, u, v) at feasible x.
Dual problem:
Let g(u, v) = minxL(x, u, v). La-
grange dual function is g. Dual problem
maxu0,v g(u, v).
Note: dual problem always concave.
Strong duality:
Always have fgwhere f,gprimal
and dual objectives. When f=g, have
strong duality. If primal is a convex prob-
lem (f, hiconvex, ljaffine) and exists a
strictly feasible x, then strong duality.
Dual example (lasso):
Have primal:
pf2

Partial preview of the text

Download Optimization Methods and Algorithms and more Study notes Algorithms and Programming in PDF only on Docsity!

I

Grid Search

LPs Standard Form: min cT^ x s.t. Ax = b, x ≥ 0 , b ≥ 0. Getting it to standard form: Getting rid of ≥,≤: x 1 ≤ 4 → x 1 + x 2 = 4, x 2 ≥ 0 Getting rid of − vars:

x ∈ R → x = u − v, u, v ∈ R+ Bounded vars: x ∈ [2, 5] → 2 ≤ x, x ≤ 5.

Simplex algorithm: (1) Take cost function, turn into min z s.t.

cT^ x = z, remainder in standard LP form. (2) Pivoting: do Gaussian Elimination to get rid of as many variables as possible, without distributing the z around. (3) Variables that have been eliminated ex- cept in one equation are dependent/basic; others independent/non-basic. Can always get a feasible point by setting non-basic variables to zero, and reading out basic variables. [ 1 0 C 0 Im A

]

[−z, xB , xN ]T^ = [−z 0 , b]T

(4) Improve solutions: find smallest reduced cost Cj. If CJ ≥ 0, optimality reached, quit. Else, J is incoming. (5) Find as far as we can go by picking out- going variable: r = argmini|Ai,j > 0 bi/Ai,j

(6) Perform elimination to get rid of J, us- ing equation that makes the outgoing vari- able a basic one. That is, take the only equation in which the outgoing variable is non-zero, and eliminate the incoming vari- able with it. (7) Repeat from 4 until optimality reached.

Convex sets,fcns: Defns: A set is is X if for any weighted sum of data points satisfying Y, the weighted sum is in the set. Convex:

i θi^ = 1,^ θi^ ≥^0 Affine:

i θi^ = 1. Conic: θi ≥ 0. Examples: Lines, line segments, hyperplanes, halfs- paces, Lp balls for p ≥ 1, polyhedrons, polytopes. Preserving operations: Translation, scaling, intersection, Affine functions (e.g., projection, coordinate drop- ping), set sum {c 1 + c 2 |c 1 ∈ C 1 , c 2 ∈ C 2 }, direct sum {(c 1 , c 2 )|c 1 ∈ C 1 , c 2 ∈ C 2 }, per- spective projection. Conv. Fcn. Defn: f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)

f (y) ≥ f (x) + ∇f (x)T^ (y − x) Preserving operations, functions: Non-negative weighted sum, pointwise- max, affine map f (Ax + b), composition, perspective map.

Strict, Strong Convexity Defns: Strict convexity: f (θx + (1 − θ)y) < θf (x) + (1 − θ)f (y) (ba- sically, not linear). m-Strong convexity: f (θx + (1 − θ)y) ≤ θf (x) + (1 − θ)f (y)

mθ(1 − θ)||x − y||^22

Better strong convexity defns:

(∇f (x) − ∇f (y))T^ (x − y) ≥ m||x − y||^22

f (y) ≥ f (x) + ∇f (x)T^ (y − x) + m 2 ||y − x||^22

∇^2 f (x) ≥ mI. Gradient Descent Given x^0 , repeat xk^ = xk−^1 − tk ∇f (xk−^1 ). Picking t: can diverge if t too big, too slow if t too small. Backtracing line search: start with t = 1, while f (x − t∇f (x)) > f (x) − αt||∇f (x)||^22 , update t = βt with 0 < α < 1 /2, 0 < β < 1.

Subgradients Defn.: Subgradient of convex f is g s.t. f (y) ≥ f (x) + gT^ (y − x) Subdifferential ∂f (X): set of all g. SG calculus: ∂(af ) = a∂f ; ∂(f 1 + f 2 ) = ∂f 1 + ∂f 2 ; ∂f (Ax + b) = AT^ ∂f (Ax + b). Finite-pointwise max: ∂ maxf ∈F f (x) is the convex hull of the active (achieving max functions at x). Norms: if f (x) = ||x||p and 1/p + 1/q = 1, then ||x||p = max||z||q ≤ 1 zT^ x; thus ∂||x||p = {y : ||y||q ≤ 1 , yT^ x = max||z||q ≤ 1 zT^ x}. Optimality: f (x∗) = min f (x) ↔ 0 ∈ ∂f (x∗) Remember that sgs may not exist for non- convex functions! Subgradient Method Given x^0 , repeat xk^ = xk−^1 − tk gk−^1 SG method not descent method; keep track of best so far. Picking t: square summable but not summable (e.g., 1 /t). Polyak steps: (f (xk−^1 ) − f (x∗))/||gk−^1 ||^22. Projected sg method: Project after taking a step.

Generalized GD Suppose f (x) = g(x) + h(x) with g convex, diff, h convex, not necessarily diff. Define proxt(x) = argminz (^21) t ||x − z||^22 + h(z); GGD is: xk^ = proxt(xk−^1 − tk ∇g(xk−^1 )) Generalized gradient since if Gt(x) = (1/t)(x − proxt(x − t∇g(x))) then update is xk^ = xk−^1 − tk Gt(xk−^1 ) With backtracking: While g(x − tGt(x)) > g(x) − t∇g(x)T^ Gt(x) + 2 t ||Gt(x)||^22 (maybe with α in last term?) update t = βt.

Example (Lasso): Prox is argminz (^21) t ||β − z||^22 + λ||z|| 1 = Sλt(β). Sλ(β) is the soft- threshold operator,

[Sλ(β)]i =

βi − λ : βi > λ 0 : −λ ≤ βi ≤ λ βi + λ : βi < −λ

Example (Matrix Completion): Objective: 1 2

(i,j) observ(Yi,j^ −^ Bi,j^ )

(^2) + λ||B||∗ with

||B||∗ =

∑r i=1 σi(B). Prox function: argminZ (^21) t ||B − Z||^2 F + λ||Z|∗. Solution: matrix soft-thresholding; U ΣλV T^ where B = U ΣV T^ and (Σλ)ii = max{Σii − λ, 0 }.

Newton’s Method: Originally devel- oped for finding roots; use it to find roots of gradient. Want ∇f (x) + ∇^2 f (x)∆x = 0; solution is ∆x = −[∇^2 f (x)]−^1 ∇f (x). Damped Newton method: xk+1^ = xk^ − hk [∇^2 f (x)]−^1 ∇f (x).

Conjugate Direction methods: Want to solve min 12 xT^ Qx − bT^ x with Q > 0. Define Q-orthogonality as dTi Qdj = 0. Exp. subspace thm.: Let {di}n i=0−^1 be Q-conjugate. (for method) gk = Qxk − b xk+1 = xk + αdk αk = −gTk dk /(dTk Qdk ) Proof sketch (gk ⊥ Bk ) by ind.: gk+1 = Qxk+1 − b = Q(xk + αk dk ) − b (Qxk − b) + αQdk = gk + αQdk From here, by defn of α, dTk gk+1 = dTk (gk + αQdk ) = dTk gk − αdTk Qdk = 0 Algorithm: Arbitrary x 0 , repeat d 0 = −g 0 = b − Qx 0 αk = −gTk dk /dTk Qdk ; xk+1 = xk + αk dk gk = Qxk − b; dk+1 = −gk+1 + βk dk βk = gTk+1Qdk /(dk Qdk )

Quasi-Newton Methods: Gist: approximate Hessian/inverse Hes- sian. Symmetric rank-one correction: Update: xk+1 = xk − αHk gk αk = argminα f (xk − αHk gk ) (LS) gk = ∇fk Hk+1 = Hk + (pk^ −Hk^ qk^ )(pk^ −Hk^ qk^ )

T qTk (pk −Hk qk ) pk = xk+1 − xk ; qk = gk+1 − gk Might not be PSD! DFP (Rank 2)

Hk+1 = Hk +

pk pTk pTk qk

Hk qk qTk Hk qTk Hk qk

BFGS Update inverse of Hessian via Sherman- Morrison). Let qk = gk+1 − gk

Hk+1 =Hk + (1 +

qTk Hk qk pTk qk

pk pTk pTk qk

pk qkT Hk + Hk qk pTk qk pk

LP Duality Let cn, Am×n, bm, Gr×n, hr. (P) min cT^ x s.t. Ax = b, Gx ≤ h (D) max −bT^ u − hT^ v s.t. −AT^ u − GT^ v = c, v ≥ 0.

Duality: Consider min f (x) s.t. hi(x) ≤ 0, i = 1,... , m lj (x) = 0 j = 1,... , r Lagrangian: L(x, u, v) = f (x) +

∑m ∑r i=1^ uihi(x)^ + j=1 vj^ lj^ (x) with^ u^ ∈^ R

m, v ∈ Rr (^) and

u ≥ 0. Note: f (x) ≥ L(x, u, v) at feasible x. Dual problem: Let g(u, v) = minx L(x, u, v). La- grange dual function is g. Dual problem maxu≥ 0 ,v g(u, v). Note: dual problem always concave. Strong duality: Always have f ∗^ ≥ g∗ where f ∗, g∗ primal and dual objectives. When f ∗^ = g∗, have strong duality. If primal is a convex prob- lem (f, hi convex, lj affine) and exists a strictly feasible x, then strong duality.

Dual example (lasso): Have primal:

I

Grid Search

minβ 12 ||y − Xβ||^22 + λ||β|| 1 ; Introduce dummy z and solve:

minβ,z 12 ||y − z||^22 + λ||β|| 1 s.t. z = Xβ. Dual is then: minβ,z 12 ||y − z||^22 + λ||β|| 1 + uT^ (z − Xβ) 1 2 ||y||

2 2 −^

1 2 ||y^ −^ u||

2 2 −^ Iv:||v||∞≤^1 (X

T (^) u/λ)

Or minu (^12)

||y||^22 − ||y − u||^22

s.t.

||XT^ u||∞ ≤ λ.

KKT Conditions: Stationarity: 0 ∈ ∂f (x) +

∑m i=1 ui∂hi(x) +^

∑r j=1 ∂lj^ (x) Complementary slackness: ui · hi(x) = 0 for all i P feas.: hi(x) ≤ 0, lj (x) = 0 for all i, j D feas.: ui ≥ 0 for all i Necessary: if strong duality, then if x∗, u∗, v∗^ solutions, then they satisfy KKT conditions. Sufficient: always, if x∗, u∗, v∗^ satisfy KKT, then primal dual solutions. Correspondence Under strong duality, x∗ achieves the minimum in L(x, u∗, v∗); if L(x, u∗, v∗) has a unique minimum, then the corresponding point is the primal solu- tion.

Correspondence, Conjugates: Defn. convex conjugate: Given f , f ∗(y) =

maxx yT^ x − f (x).

Implies f (x) + f ∗(y) ≥ xT^ y. If f closed and convex, ∗∗^ = f.

Example, norm: If f (x) = ||x||, f ∗(y) = Iz:||z|∗≤ 1 (y)

Ellipsoid method for LP: Solves feasi- bility problems, but any LP can be turned into a feasibility problem. Setup: Let Ω be the set satisfying the constraints. Assume Ω ⊆ R-radius ball centered at y 0 , and there is a ball with radius r centered at y∗^ inside Ω. We know R, r, y 0 , but not y∗. Iterations: Can check if center of ellipsoid k is in Ω; if so, done. Else: find a constraint that is violated, find side that is not violated, fit ellipsoid to that half. Convergence:

Vol(k ) Vol( 0 )

( (^) τ

R

)m ≤

)k/m

which implies k ≤ O(m^2 log R/τ ) where τ = 1/(m + 1).

Penalty Methods: Original constrained problem (P), minx∈S f (x), replace with unconstrained

problem min f (x) + cp(x). p satisfies: p continuous, p(x) ≥ 0, p(x) = 0 iff x ∈ S. Idea: find some solution, increasingly pe- nalize outside S by increasing c → ∞: Penalty functions: p(x) = (^12)

∑p i=1 max([0, gi(x)])

2

Barrier Methods: Replace original problem with minx f (x) + 1 c B(x) where^ B^ is continuous;^ B(x)^ ≥^0 for all x ∈ int(S); B(x) → ∞ as x → ∂S. Idea: start out in interior, don’t let the al- gorithm leave S. Increase c → ∞. Barrier functions: Suppose gi(x) ≤ 0: B(x) = −

∑m i=

1 gi(x) B(x) = −

∑m i=1 log(−gi(x)) SDP:∑ ∑ Inner product: tr(A · B) = Ai,j Bi,j

ICA: Step 1: whiten. Step 2: want to mini- mize gaussian-likeness. But non-convex and lots of local minima. Assume additive lin- ear model. Whitening: Σ = cov(X) = U DU T^ , A∗^ = D−^1 /^2 U T^ A. Coordinate descent: Do argmin on each dimension, updating one-by-one. When does∑ coordinate descent work? g(x) + i hi(xi) Non-convex problems: Specialized ap- proach for each.

Convex Conjugates:

f ∗(y) = max x

xT^ x∗^ − f (x)

− min f

(x) − xT^ x∗

f (ax) f ∗(x∗/a) f (x + b) f ∗(x∗) − bT^ x∗ af (x) af ∗(x∗/a) ex^ x∗^ log(x∗) − x∗ ||x|| I||z||∗≤ 1 (x∗) Matrix derivatives: ∂A = 0 ∂(aX) = a∂X ∂(tr(X)) = tr(∂X) ∂(XY ) = (∂X)Y + X(∂Y ) ∂xT^ a/∂x = a ∂xT^ Xb/∂X = abT Suppose s,r are functions of x and A is constant,

∂sT^ Ar ∂x

∂s ∂x

T Ar +

∂r ∂x

T A

T s

Matrix properties: SVD: A = U ΣV T^ where: U are the eigenvectors of AAT D =

diag(eig(AAT^ )) V are the eigenvectors of AT^ A. Can also write A as the weighted sum of r rank-1 matrices. The rank-1 matrices are ΣiiUiV (^) iT for 1 ≤ i ≤ r.

EVD: X = V DV −^1 with D diagonal. If X is symmetric, V V T^ = I.

Traces: Linear. tr(A) = tr(AT^ ) tr(XT^ Y ) = tr(XY T^ ) tr(XT^ Y ) = vec(X)T^ vec(Y ) tr(ABC) = tr(BCA) = tr(CAB) P −^1 exists, tr(A) = tr(P −^1 AP ). tr(A) =

i λi Sherman-Morrison Mat. Inv.: Suppose A−^1 exists, 1 + vT^ A−^1 u 6 = 0. (A + uvT^ )−^1 = A−^1 − A

− (^1) uvT (^) A− 1 1+vT^ A−^1 u Matrix norms: Trace/Nuclear norm: ||A||∗ =

∑r i=1 σi(a) Spectral/Operator norm: ||A||op = σ 1 (A) Frobenius norm: ||A||F = tr(AT^ A).

Derivatives: f (x)g(x) f ′(x)g(x) + f (x)g′(x) f (g(x)) f ′(g(x))g′(x) xn^ nxn−^1 1 /f (x) −f −^2 f ′(x) f (x)/g(x) (f ′(x)g(x) − g′(x)f (x))/(g(x)^2 ) ex^ ex ln(x) 1 /x logc(x) 1 /(x ln(c)) Miscellaneous math: Lipschitz: A function f is Lipschitz contin- uous if |f (x 1 )−f (x 2 )| ≤ L|x 1 −x 2 |; controls how quickly the function changes. Gradient Lipschtiz: A differentiable function f has Lipschitz continuous gradient ||∇f (y) − ∇f (x)|| ≤ L||y − x||; if it is twice-differentiable, LI ≥ ∇^2 f (x). Useful inequalities: Cauchy-Schwarz: |xT^ y| ≤ ||x|| · ||y||. H¨older: ||f g|| 1 ≤ ||f ||p||g||q for 1/p + 1/q =

Gr. SG. Prox. New. Conj. QN Bar. P/D IPM

Crit f sm any sm g + simple h 2 × sm 2 × 2 × 2 × 2 ×

Const. Proj. Proj. Const. Prox Equality None None 2 × sm. ineq. 2 × sm. ineq.

Param. fix t/LS t → 0 fix t/LS fix t = 1/LS fix/LS LS in: fixed/LS; in:LS

out.: bar. → ∞ out.: bar. → ∞

Cost/It. chp chp? prox Exp. (∇^2 ) ≈ chp ≈ chp V.Exp ≈ Exp

+Storage

Rate O(1/) O(1/^2 ) O(1/) O(log(log(1/))) super-lin. superlin. O(log(1/)) O(log(1/))

Gr. and Prox. Gr. are O(1/

) w/ accel., O(log(1/)) w/strong convexity.