




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
In this book we aim to present, in a unified framework, a broad spectrum of mathematical theory that has grown in connection with the study of prob- lems of ...
Typology: Slides
1 / 743
This page cannot be seen from the preview
Don't miss anything!





























































































with figures drawn by Maria Wets
1997, 2nd printing 2004, 3rd printing 2009
In this book we aim to present, in a unified framework, a broad spectrum of mathematical theory that has grown in connection with the study of prob- lems of optimization, equilibrium, control, and stability of linear and nonlinear systems. The title Variational Analysis reflects this breadth. For a long time, ‘variational’ problems have been identified mostly with the ‘calculus of variations’. In that venerable subject, built around the min- imization of integral functionals, constraints were relatively simple and much of the focus was on infinite-dimensional function spaces. A major theme was the exploration of variations around a point, within the bounds imposed by the constraints, in order to help characterize solutions and portray them in terms of ‘variational principles’. Notions of perturbation, approximation and even generalized differentiability were extensively investigated. Variational theory progressed also to the study of so-called stationary points, critical points, and other indications of singularity that a point might have relative to its neighbors, especially in association with existence theorems for differential equations. With the advent of computers, there has been a tremendous expansion of interest in new problem formulations that similarly demand such modes of analysis but are far from being covered by classical concepts, not to speak of classical results. For those problems, finite-dimensional spaces of arbitrary dimensionality are important alongside of function spaces, and theoretical con- cerns go hand in hand with the practical ones of mathematical modeling and the design of numerical procedures. It is time to free the term ‘variational’ from the limitations of its past and to use it to encompass this now much larger area of modern mathematics. We see ‘variations’ as referring not only to movement away from a given point along rays or curves, and to the geometry of tangent and normal cones associated with that, but also to the forms of perturbation and approximation that are describable by set convergence, set-valued mappings and the like. Subgradients and subderivatives of functions, convex and nonconvex, are crucial in analyzing such ‘variations’, as are the manifestations of Lipschitzian continuity that serve to quantify rates of change. Our goal is to provide a systematic exposition of this broader subject as a coherent branch of analysis that, in addition to being powerful for the problems that have motivated it so far, can take its place now as a mathematical discipline ready for new applications. Rather than detailing all the different approaches that researchers have been occupied with over the years in the search for the right ideas, we seek to reduce the general theory to its key ingredients as now understood, so as to make it accessible to a much wider circle of potential users. But within that consolidation, we furnish a thorough and tightly coordinated exposition of facts and concepts. Several books have already dealt with major components of the subject. Some have concentrated on convexity and kindred developments in realms of nonconvexity. Others have concentrated on tangent vectors and subderiva- tives more or less to the exclusion of normal vectors and subgradients, or vice versa, or have focused on topological questions without getting into general- ized differentiability. Here, by contrast, we cover set convergence and set-valued mappings to a degree previously unavailable and integrate those notions with both sides of variational geometry and subdifferential calculus. We furnish a needed update in a field that has undergone many changes, even in outlook. In addition, we include topics such as maximal monotone mappings, generalized second derivatives, and measurable selections and integrands, which have not
iii
of G¨ul G¨urkan, Douglas Lepro, Yonca Ozge, and Stephen Robinson. Conver-¨ sations we had over the years with our students and colleagues contributed significantly to the final form of the book as well. Grants from the National Science Foundation were essential in sustaining the long effort. The changes in this third printing mainly concern various typographical, corrections, and reference omissions, which came to light in the first and second printing. Many of these reached our notice through our own re-reading and that of our students, as well as the individuals already mentioned. Really major input, however, arrived from Shu Lu and Michel Valadier, and above all from Lionel Thibault. He carefully went through almost every detail, detecting numerous places where adjustments were needed or desirable. We are extremely indebted for all these valuable contributions.
2 1. Max and Min
argminC f := argmin x∈C
f (x)
x ∈ C
∣ (^) f (x) = infC f
if infC f = ∞, ∅ if infC f = ∞, argmaxC f := argmax x∈C
f (x)
x ∈ C
∣ (^) f (x) = sup C f^
if supC f = −∞, ∅ if supC f = −∞.
Note that we don’t regard the minimum as being attained at any x ∈ C when
f ≡ ∞ on C, even though we may write minC f = ∞ in that case, nor do we
regard the maximum as being attained at any x ∈ C when f ≡ −∞ on C. The
reasons for these exceptions will be explained shortly. Quite apart from whether
infC f < ∞ or supC f > −∞, the sets argminC f and argmaxC f could be
empty in the absence of appropriate conditions of continuity, boundedness or
growth. A simple and versatile statement of such conditions will be devised in
this chapter.
The roles of ∞ and −∞ deserve close attention here. Let’s look specifically
at minimizing f over C. If there is a point x ∈ C where f (x) = −∞, we know
at once that x furnishes the minimum. Points x ∈ C where f (x) = ∞, on the
other hand, have virtually the opposite significance. They aren’t even worth
contemplating as candidates for furnishing the minimum, unless f has ∞ as
its value everywhere on C, a case that can be set aside as expressing a form of
degeneracy—which we underline by defining argminC f to be empty then. In
effect, the side condition f (x) < ∞ is considered to be implicit in minimizing
f (x) over x ∈ C. Everything of interest is the same as if we were minimizing
over C′^ :=
x ∈ C
∣ (^) f (x) < ∞
instead of C.
This gives birth to an important idea in the context of C being a subset of IRn.
Perhaps f is merely real-valued on C, but whether this is true or not, we can
transform the problem of minimizing f over C into one of minimizing f over
all of IRn^ just by defining (or as the case may be, redefining) f (x) to be ∞
for all the points x ∈ IRn^ such that x ∈ C. This helps in thinking abstractly
about minimization and in achieving a single framework for the development
of properties and results.
1.1 Example (equality and inequality constraints). A set C ⊂ IRn^ may be
specified as consisting of the vectors x = (x 1 ,... , xn) such that
x ∈ X and
fi(x) ≤ 0 for i ∈ I 1 , fi(x) = 0 for i ∈ I 2 ,
where X is some subset of IRn^ and I 1 and I 2 are index sets for families of
functions fi : IRn^ → IR called constraint functions. The conditions fi(x) ≤ 0
A. Penalties and Constraints 3
are inequality constraints on x, while those of form fi(x) = 0 are equality
constraints; the condition x ∈ X (where in particular X could be all of IRn) is
an abstract or geometric constraint.
A problem of minimizing a function f 0 : IRn^ → IR subject to all of these
constraints can be identified with the problem of minimizing the function f :
IRn^ → IR defined by taking f (x) = f 0 (x) when x satisfies the constraints but
f (x) = ∞ otherwise. The possibility of having inf f = ∞ corresponds then to
the possibility that C = ∅, i.e., that the constraints may be inconsistent.
C
Fig. 1–1. A set defined by inequality constraints.
Constraints can also have the form fi(x) ≤ ci, fi(x) = ci or fi(x) ≥ ci for
values ci ∈ IR, but this doesn’t add real generality because fi can always be
replaced by fi − ci or ci − fi. Strict inequalities are rarely seen in constraints,
however, since they could threaten the attainment of a maximum or minimum.
An abstract constraint x ∈ X is often convenient in representing conditions
of a more complicated or open-ended nature, to be worked out later, but also
for conditions too simple to be worth introducing constraint functions for, such
as upper or lower bounds on the variables xj as components of x.
1.2 Example (box constraints). A set X ⊂ IRn^ is called a box if it is a product
X 1 × · · · × Xn of closed intervals Xj of IR, not necessarily bounded. The
condition x ∈ X, a box constraint on x = (x 1 ,... , xn), then restricts each
variable xj to Xj. For instance, the nonnegative orthant
IRn + :=
x = (x 1 ,... , xn)
∣ (^) xj ≥ 0 for all j
= [0, ∞)n
is a box in IRn; the constraint x ∈ IRn + restricts all variables to be nonnegative.
With X = IRs + × IRn−s^ = [0, ∞)s^ × (−∞, ∞)n−s, only the first s variables xj
would have to be nonnegative. In other cases, useful for technical reasons, some
intervals Xj could have the degenerate form [cj , cj ], which would force xj = cj.
Constraints refer to the structure of the set over which the minimization or
maximization should effectively take place, and in the approach of identifying
a problem with a function f : IR n → IR they enter the specification of f. But
the structure of the function being minimized or maximized can be affected by
constraint representations in other ways as well.
A. Penalties and Constraints 5
Everything said about minimization can be translated into the language
of maximization, with −∞ taking the part of ∞. Such symmetry is reassur-
ing, but it must be understood that a basic asymmetry is implicit too in the
approach we’re taking. In passing from the minimization of a given function
over C to the minimization of a corresponding function over IRn, we’ve resorted
to an extension by the value ∞, but in the case of maximization it would be
−∞. The extended function would then be different, and so would be the
properties we’d like it to have. In effect we’re abandoning any predisposition
toward having a theory that treats maximization and minimization together on
an equal footing. In the assumptions eventually imposed to identify the classes
of functions most suitable for applying these operations, we mark out separate
territories for each.
In actual practice there’s rarely a need to consider both minimization and
maximization simultaneously for a single combination of a function f and a
set C, so this approach causes no discomfort. Rather than spend too many
words on parallel statements, we adopt minimization as the vehicle of expo-
sition and mention maximization only from time to time, taking for granted
that the reader will generally understand the accommodations needed in that
direction. We thereby enter a pattern of working mainly with extended-real-
valued functions on IRn^ and treating them in a one-sided manner where ∞ has
a qualitatively different role from that of −∞ in our formulas, and where the
terminology and notation reflect this bias.
Starting off now on this path, we introduce for f : IRn^ → IR the set
dom f :=
x ∈ IR n ∣∣ f (x) < ∞
called the effective domain of f , and write
inf f := infx f (x) := inf x∈IRn^
f (x) = inf x∈dom f
f (x),
argmin f := argminx f (x) := argmin x∈IRn
f (x) = argmin x∈dom f
f (x).
We call f a proper function if f (x) < ∞ for at least one x ∈ IRn, and f (x) >
−∞ for all x ∈ IRn, or in other words, if dom f is a nonempty set on which f is
finite; otherwise it is improper. The proper functions f : IR n → IR are thus the
ones obtained by taking a nonempty set C ⊂ IRn^ and a function f : C → IR,
and putting f (x) = ∞ for all x ∈ C. All other kinds of functions f : IRn^ → IR
are termed improper in this context. While proper functions are our central
concern, improper functions may arise indirectly and can’t always be excluded
from consideration.
The developments so far can be summarized as follows in the language of
optimization.
1.4 Example (principle of abstract minimization). Problems of minimizing a
finite function over some subset of IRn^ correspond one-to-one with problems of
minimizing over all of IRn^ a function f : IRn^ → IR, under the identifications:
6 1. Max and Min
dom f = set of feasible solutions, argmin f = set of optimal solutions, inf f = optimal value.
The convention that argmin f = ∅ when f ≡ ∞ ensures that a problem
is not regarded as having an optimal solution if it doesn’t even have a feasible
solution. A lack of feasible solutions is signaled by the optimal value being ∞.
argmin f
f
IR
IRn
Fig. 1–3. Local and global optimality in a difficult yet classical case.
It should be emphasized here that the notation argmin f refers to points
x¯ giving a global minimum of f. A local minimum occurs at ¯x if f (¯x) < ∞ and
f (x) ≥ f (¯x) for all x ∈ V , where
V ∈ N (¯x) := the collection of all neighborhoods of x.¯
Then ¯x is a locally optimal solution to the problem of minimizing f. By a
neighborhood of x one means any set having x in its interior, for example a
closed ball
IB(x, λ) :=
x′^
∣ (^) d(x, x′) ≤ λ
where we use the notation
d(x, x′) :=^ |x^ −^ x′|^ (Euclidean distance),^ with
|x| := |(x 1 ,... , xn)| =
x^21 + · · · + x^2 n (Euclidean norm).
A point ¯x giving a local minimum of f can also be viewed as giving the global
minimum in an auxiliary problem in which the function agrees with f on some
neighborhood of ¯x but takes the value ∞ elsewhere, so the study of local
optimality can to a large extent be subsumed into the study of global optimality.
An extremely useful type of function in the framework we’re adopting is
the indicator function δC of a set C ⊂ IRn, which is defined by
δC (x) = 0 if^ x^ ∈^ C,^ δC (x) =^ ∞^ if^ x^ ∈^ C.
The indicator functions on IRn^ are characterized as a class by taking on no value
other than 0 or ∞. The constant function 0 is the indicator of C = IRn, while
8 1. Max and Min
Every property of f has its counterpart in a property of epi f , because the
correspondence between functions and epigraphs is one-to-one. Many proper-
ties also relate very naturally to the various level sets of f. In general, we’ll
find it useful to have the notation
lev≤α f :=
x ∈ IR n ∣∣ f (x) ≤ α
lev<α f :=
x ∈ IRn^
∣ (^) f (x) < α
lev=α f :=
x ∈ IRn^
∣ (^) f (x) = α
lev>α f :=
x ∈ IRn^
∣ (^) f (x) > α
lev≥α f :=
x ∈ IRn^
∣ (^) f (x) ≥ α
The most important of these in the context of minimization are the lower level
sets lev≤α f. For α finite, they correspond to the ‘horizontal cross sections’ of
epi f. For α = inf f , one has lev≤α f = lev=α f = argmin f.
epi f
f α
n dom f
lev_α< f
IR
IR
Fig. 1–4. Epigraph and effective domain of an extended-real-valued function.
We’re ready now to answer a basic question about a function f : IR n → IR.
What property of f translates into the sets lev≤α f all being closed? The
answer depends on a one-sided concept of limit.
1.5 Definition (lower limits and lower semicontinuity). The lower limit of a
function f : IRn^ → IR at x¯ is the value in IR defined by
lim inf x→x¯
f (x) : = lim δ ↘^0
inf x∈IB(¯x,δ)
f (x)
= sup δ> 0
inf x∈IB(¯x,δ)
f (x)
= sup V ∈N (¯x)
inf x∈V
f (x)
The function f : IRn^ → IR is lower semicontinuous (lsc) at ¯x if
lim inf x→x¯
f (x) ≥ f (¯x), or equivalently lim inf x→¯x
f (x) = f (¯x), 1(2)
and lower semicontinuous on IRn^ if this holds for every x¯ ∈ IRn.
The two versions in 1(2) agree because inf
f (x)
∣ (^) x ∈ IB(¯x, δ)
≤ f (¯x) for
B. Epigraphs and Semicontinuity 9
all δ > 0. For this reason too,
lim inf x→x¯
f (x) ≤ f (¯x) always. 1(3)
In replacing the limit as δ ↘^ 0 by the supremum over δ > 0 in 1(1) we appeal
to the general fact that
inf x∈X 1
f (x) ≤ inf x∈X 2
f (x) when X 1 ⊃ X 2.
1.6 Theorem (characterization of lower semicontinuity). The following proper-
ties of a function f : IRn^ → IR are equivalent:
(a) f is lower semicontinuous on IR n ; (b) the epigraph set epi f is closed in IRn^ × IR; (c) the level sets of type lev≤α f are all closed in IR n .
These equivalences will be established after some preliminaries. An exam-
ple of a function on IR that happens to be lower semicontinuous at every point
but two is displayed in Figure 1–5. Notice how the defect is associated with
the failure of the epigraph to include all of its boundary.
f
epi f
x n
IR
IR
Fig. 1–5. An example where lower semicontinuity fails.
In the proof of Theorem 1.6 and throughout the book, we use sequence
notation in which the running index is always superscript ν (Greek ‘nu’). We
symbolize the natural numbers by IN, so that ν ∈ IN means ν = 1, 2 ,.. .. The
notation xν^ → x, or x = limν xν^ , refers then to a sequence
xν^
ν∈IN in^ IR
n
that converges to x, i.e., has |xν^ − x| → 0 as ν → ∞. We speak of x as a
cluster point of xν^ as ν → ∞ if, instead of necessarily claiming xν^ → x, we
wish merely to assert that some subsequence converges to x. (Every bounded
sequence in IRn^ has at least one cluster point. A sequence in IRn^ converges to
x if and only if it is bounded and has x as its only cluster point.)
1.7 Lemma (characterization of lower limits).
lim inf x→x¯
f (x) = min
α ∈ IR
∣ (^) ∃xν^ → x¯ with f (xν^ ) → α
(Here the constant sequence xν^ ≡ x¯ is admitted and yields α = f (¯x).)
C. Attainment of a Minimum 11
be true that f (xν^ ) ≤ α, or in other words, that xν^ belongs to lev≤α f. Since
xν^ → ¯x, this level set, which by assumption is closed, must contain ¯x. Thus
we have f (¯x) ≤ α for every α > α¯. Obviously, then, f (¯x) ≤ α¯.
When Theorem 1.6 is applied to indicator functions, it reduces to the fact
that δC is lsc if and only if the set C is closed. The lower semicontinuity of
a general function f : IRn^ → IR doesn’t require dom f to be closed, however,
even when dom f happens to be bounded. Figure 1–6 illustrates this.
Another question can now be addressed. What conditions on a function f :
IRn^ → IR ensure that f attains its minimum over IRn^ at some x, i.e., that the
set argmin f is nonempty? The issue is central because of the wide spectrum
of minimization problems that can be put into this simple-looking form.
A fact customarily cited is this: a continuous function on a compact set
attains its minimum. It also, of course, attains its maximum; this assertion
is symmetric with respect to max and min. A more flexible approach is de-
sirable, however. We don’t always wish to single out a compact set, and con-
straints might not even be present. The very distinction between constrained
and unconstrained minimization is suppressed in working with the principle of
abstract minimization in 1.4, not to mention problem formulations involving
penalty expressions as in 1.3. It’s all just a matter of whether the function f
being minimized takes on the value ∞ in some regions or not. Another feature
is that the functions we want to deal with may be far from continuous. The one
in Figure 1–6 is a case in point, but that function f does attain its minimum.
A property that’s crucial in this regard is the following.
1.8 Definition (level boundedness). A function f : IR n → IR is (lower) level-
bounded if for every α ∈ IR the set lev≤α f is bounded (possibly empty).
Note that only finite values of α are considered in this definition. The level
boundedness property corresponds to having f (x) → ∞ as |x| → ∞.
1.9 Theorem (attainment of a minimum). Suppose f : IR n → IR is lower semi-
continuous, level-bounded and proper. Then the value inf f is finite and the
set argmin f is nonempty and compact.
Proof. Let ¯α = inf f ; because f is proper, ¯α < ∞. For α ∈ ( ¯α, ∞), the set
lev≤α f is nonempty; it’s closed because f is lsc (cf. 1.6) and bounded because
f is level-bounded. The sets lev≤α f for α ∈ ( ¯α, ∞) are therefore compact
and nested: lev≤α f ⊂ lev≤β f when α < β. The intersection of this family of
sets, which is lev≤ α¯ f = argmin f , is therefore nonempty and compact. Since f
doesn’t have the value −∞ anywhere, we conclude also that ¯α is finite. Under
these circumstances, inf f can be written as min f.
12 1. Max and Min
1.10 Corollary (lower bounds). If f : IRn^ → IR is lsc and proper, then it
is bounded from below (finitely) on each bounded subset of IRn^ and in fact
attains a minimum relative to any compact subset of IR n that meets dom f.
Proof. For any bounded set B ⊂ IRn^ apply the theorem to the function g
defined by g(x) = f (x) when x ∈ cl B but g(x) = ∞ when x /∈ cl B. The case
where g ≡ ∞ can be dealt with as a triviality, while in all other cases g is lsc,
level-bounded and proper.
The conclusion of Theorem 1.9 would hold with level boundedness replaced
by the weaker assumption that, for some α ∈ IR, the set lev≤α f is bounded
and nonempty; this is easily gleaned from the proof. But level boundedness is
more convenient to work with in applications, and it’s typically present anyway
in situations where the attainment of a minimum is sought.
The crucial ingredient in Theorem 1.9 is the fact that when f is both
lsc and level-bounded it is inf-compact, which means that the sets lev≤α f for
α ∈ IR are all compact. This property is very flexible in providing a criterion
for the existence of optimal solutions, and it can be applied to a variety of
problems, with or without constraints.
1.11 Example (level boundedness relative to constraints). For a problem of
minimizing a continuous function f 0 : IRn^ → IR over a nonempty, closed set
C ⊂ IRn, if all sets of the form
{ x ∈ C
∣ (^) f 0 (x) ≤ α
for α ∈ IR
are bounded, then the minimum of f 0 over C is finite and attained on a
nonempty, compact subset of C.
This criterion is fulfilled in particular if C is bounded or if f 0 is level
bounded, with the latter condition covering even the case of unconstrained
minimization, where C = IRn.
Detail. The problem corresponds to minimizing f = f 0 + δC over IR n
. Here
f is proper because C = ∅, and it’s lsc by 1.6 because its level sets of the form
C ∩
x
∣ (^) f 0 (x) ≤ α
for α < ∞ are closed—by virtue of the closedness of C
and the continuity of f 0. In assuming these sets are also bounded, we get the
desired conclusions from 1.9.
An illustration of existence in the pattern of Example 1.11 with C not
necessarily bounded but f 0 inf-compact is furnished by f 0 (x) = |x|. The min-
imization problem consists then of finding the point or points of C nearest to
the origin of IR n
. Theorem 1.9 is also applicable, of course, to minimization
problems that do not fit the pattern of 1.11 at all. For instance, in minimizing
the function in Figure 1–6 one isn’t simply minimizing a continuous function
relative to a closed set, but the conditions in 1.9 are satisfied and a minimizing
point exists. This is the kind of situation encountered in general when dealing
with barrier functions, for instance.