Optimization Techniques: Cost Functions, Bracketing Methods, and Gradient Descent - Prof. , Study notes of Computer Science

Various optimization techniques, including cost functions, bracketing methods in one and multiple dimensions, and gradient descent. It discusses how to find the global minimum and local minimum of a cost function, the importance of derivative information, and different methods for bracketing a minimum in one and multiple dimensions. The document also introduces the downhill simplex method and basic calculus concepts, such as the direction of maximum increase and critical points of a function.

Typology: Study notes

Pre 2010

Uploaded on 02/13/2009

koofers-user-sx5
koofers-user-sx5 🇺🇸

10 documents

1 / 19

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Optimization - 2
CMSC828 D
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13

Partial preview of the text

Download Optimization Techniques: Cost Functions, Bracketing Methods, and Gradient Descent - Prof. and more Study notes Computer Science in PDF only on Docsity!

Optimization - 2

CMSC828 D

Outline

  • Cost functions (last class)
  • Given a cost function we can calculate
    • The global minimum
    • A local minimum
  • Algorithms can be classified according to
    • Derivative information available/not available or expensive
      • Derivatives via finite-differences
    • Linear or nonlinear
    • Local minimum or global minimum
    • Differential or “statistical”
    • Constrained or Unconstrained
  • Read Chapter 10-0 of Numerical Recipes.
  • Focus will not be on details but educated use of these

routines as black-boxes.

Bracketing a minimum in multiple dimensions

  • Smallest region bounded by a group of points in
    • 1D is bounded by two points (a line segment)
    • 2D is bounded by three points (a triangle)
    • 3D by four points (a tetrahedron)
    • In N D by N+1 points (a simplex)
  • Can find a direction of a decreasing function in
    • 1D by the line from point with higher value to lower
    • 2D by joining point with highest value through point with average value on the opposite side of the triangle
    • And so on for N D
  • However cannot guarantee a bracket of a minimum in N D

Downhill Simplex Method (Nelder-Mead)

  • Reflection: Project along the

direction of decrease with size 1.

  • Reflection and expansion:If

decrease is large try a step of

size 2.

  • Contraction: Result of reflection

is bad, so try a simple reduction

within simplex.

  • Multiple contraction: If result of

contraction does not give a better

result than lowest point.

  • Conclude: volume of simplex

becomes below tolerance.

Newton’s Method

  • If f ( x ) is a scalar valued function of n variables x
    • No way to get n equations from one equation above
    • Use steepest descent methods
  • However in optimization problems we are usually solving

for the minimum of a scalar valued function of multiple

variables f ( x ), where x is an n dimensional vector

  • We need to solve an equation of the type g ( x )= ∇ f=
  • Same prescription works but now ∇g is a matrix called the Jacobian matrix
  • Solve the equation to get corrections and iterate
  • However note that we are actually computing Hessian of f

f ( x + h ) = f ( xi + hi ) = f ( xi ) + h fi i ( xi )= 0

( ) (^) j ( (^) i i ) (^) j ( (^) i ) (^) i j 0 i

g g x h g x h x

  • = + = + = ∂

g x h

Gradient Descent

  • We have a function f and an estimate of its gradient ∇ f
  • Decrease f by a quantity along the direction of ∇ f
    • Begin initialize x, tol, k= do k<-k+ x x-h kf until h kf< tol` return x end
  • Determining h is not easy
    • Called “learning rate” in AI
    • Hard to determine h
      • If h is too small algorithm will be too slow to converge. If it is too large the procedure will diverge
      • Can select it using a line search or using a Newton method.

Function Evaluations

  • Often evaluating the function is hard
    • Crash a car to measure a data point
  • Analytical expressions for the derivatives are harder, and

very much prone to programming error.

  • Analytical derivatives should always be compared with finite difference estimates for accuracy
  • Often derivatives are evaluated using finite differences.
  • Recall f/^ = h-1^ ( f(x+h)-f(x)) => 2 function evaluations
  • For an n dimensional function we need at least n+1 function evaluations to get the derivative
  • However recall that this is the least accurate
  • Promising research area : Use chain rule and semantic

parsing of functions to perform automatic differentiation

Powell’s method

  • Sometimes it is not possible to estimate the derivative ∇ f to obtain the direction in a steepest descent method
  • First guess, minimize along one coordinate axis, then along other and so on.Repeat
  • Can be very slow to converge
  • Conjugate directions: Directions which are independent of each other so that minimizing along each one does not move away from the minimum in the other directions.
  • Powell introduced a method to obtain conjugate directions without computing the derivative.
  • Use the fact that there is a routine available to calculate f

and the Jacobian ∇ f to calculate iteratively approaximations

to the minimum

  • Conjugate gradients performs minimizations in conjugate directions without constructing A
  • Quasi Newton methods construct approximations to A -1^ iteratively
  • Black boxes, as far as this course is concerned.
  • Generally only worth it when we are

in the vicinity of a minumum.

  • For nonlinear problems they often

converge to a local minimum away

from the true one.

Conjugate gradient and quasi-newton

  • Return to problem of model

fitting by minimizing

  • As before set
  • Observation: steepest descent methods move faster (per

function evaluation) far away from the minimum while

Newton methods do well near it.

  • Idea combine them so that the method adapts according to

the location in parameter space.

  • Usually for model fitting it is not too difficult to calculate

derivatives

Levenberg Marquardt

LM Algorithm

  • When the algorithm has converged set λ=0 and

compute the final solution

Constrained optimization

  • We have to optimize f(x) subject to g(x)=
    • Makes sense if g(x)=0 leaves a few degrees of freedom ( N-M )
  • Approach 1 (Eliminate constraints)
    • Eliminate variables using constraint equations and solve a reduced problem f(x *^ )= 0
    • Not practical, except for simple problems
  • Approach 2 (Penalty function)
    • Construct a new minimization function f(x)+Pg(x) where P>>
    • If constraint is violated the minimization function increases rapidly, forcing the optimization routine to solutions where it is not violated
  • Approach 3 (Lagrange Multipliers)
    • Solution has to lie on the surface of g(x)=
    • Can’t have ∇ f =0 anymore
    • However we require ∇ f parallel tog=

Linear programming

  • Black box in this course
  • Solve problems with systems of linear equality and inequality constraints