




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
In mathematical optimization we seek to either minimize or maximize a function over a set of alternatives. The function is called the objective function, ...
Typology: Slides
1 / 110
This page cannot be seen from the preview
Don't miss anything!





























































































In mathematical optimization we seek to either minimize or maximize a function over a set of alternatives. The function is called the objective function, and we allow it to be transfinite in the sense that at each point its value is either a real number or it is one of the to infinite values ±∞. The set of alternatives is called the constraint region. Since every maximization problem can be restated as a minimization problem by simply replacing the objective f 0 by its negative −f 0 (and visa versa), we choose to focus only on minimization problems. We denote such problems using the notation
minimize x∈X f 0 (x)
subject to x ∈ Ω,
where f 0 : X → R ∪ {±∞} is the objective function, X is the space over which the optimization occurs, and Ω ⊂ X is the constraint region. This is a very general description of an optimization problem and as one might imagine there is a taxonomy of optimization problems depending on the underlying structural features that the problem possesses, e.g., properties of the space X, is it the integers, the real numbers, the complex numbers, matrices, or an infinite dimensional space of functions, properties of the function f 0 , is it discrete, continuous, or differentiable, the geometry of the set Ω, how Ω is represented, properties of the underlying applications and how they fit into a broader context, methods of solution or approximate solution, .... For our purposes, we assume that Ω is a subset of Rn^ and that f 0 : Rn^ → R ∪ {±∞}. This severely restricts the kind of optimization problems that we study, however, it is sufficiently broad to include a wide variety of applied problems of great practical importance and interest. For example, this framework includes linear programming (LP).
Linear Programming In the case of LP, the objective function is linear, that is, there exists c ∈ Rn^ such that
f 0 (x) = cT^ x =
∑^ n
j=
cj xj ,
and the constraint region is representable as the set of solution to a finite system of linear equation and inequalities,
(2) Ω =
x ∈ Rn
∑^ n
i=
aij xj ≤ bj , i = 1,... , s,
∑^ n
i=
aij xj = bj , i = s + 1,... , m
where A := [aij ] ∈ Rm×n^ and b ∈ Rm.
However, in this course we are primarily concerned with nonlinear problems, that is, problems that cannot be encoded using finitely many linear function alone. A natural generalization of the LP framework to the nonlinear setting is to simply replace each of the linear functions with a nonlinear function. This leads to the general nonlinear programming (NLP) problem which is the problem of central concern in these notes.
Nonlinear Programming In nonlinear programming we are given nonlinear functions fi : Rn^ → R, i = 1, 2 ,... , m, where f 0 is the objective function in (1) and the functions fi, i = 1, 2 ,... , m are called the constraint functions which are used to define the constrain region in (1) by setting
(3) Ω = {x ∈ Rn^ | fi(x) ≤ 0 , i = 1,... , s, fi(x) = 0, i = s + 1,... , m }.
If Ω = Rn, then we say that the problem (1) is an unconstrained optimization problem; otherwise, it called a constrained problem. We begin or study with unconstrained problems. They are simpler to handle since we are only concerned with minimizing the objective function and we need not concern ourselves with the constraint region.
5
Numerical linear algebra lies at the heart of modern scientific computing and computational science. Today it is not uncommon to perform numerical computations with matrices having millions of components. The key to understanding how to implement such algorithms is to exploit underlying structure within the matrices. In these notes we touch on a few ideas and tools for dissecting matrix structure. Specifically we are concerned with the block structure matrices.
then A 2 , 4 = −100,
A 1 · =
and
A· 1 =
Exercise 1.1. If
what are C 4 , 4 , C· 4 and C 4 ·? For example, C 2 · =
and C· 2 =
The block structuring of a matrix into its rows and columns is of fundamental importance and is extremely useful in understanding the properties of a matrix. In particular, for A ∈ Rm×n^ it allows us to write
Am·
and A =
A· 1 A· 2 A· 3... A·n
These are called the row and column block representations of A, respectively
7
8 2. REVIEW OF MATRICES AND BLOCK STRUCTURES
1.1. Matrix vector Multiplication. Let A ∈ Rm×n^ and x ∈ Rn. In terms of its coordinates (or components),
we can also write x =
x 1 x 2 .. . xn
with each xj ∈ R. The term xj is called the jth component of x. For example if
x =
then n = 3, x 1 = 5, x 2 = − 100 , x 3 = 22. We define the matrix-vector product Ax by
Ax =
A 1 · • x A 2 · • x A 3 · • x .. . Am· • x
where for each i = 1, 2 ,... , m, Ai· • x is the dot product of the ith row of A with x and is given by
Ai· • x =
∑^ n
j=
Aij xj.
For example, if
(^) and x =
then
Ax =
Exercise 1.2. If
and x =
what is Cx?
Note that if A ∈ Rm×n^ and x ∈ Rn, then Ax is always well defined with Ax ∈ Rm. In terms of components, the ith component of Ax is given by the dot product of the ith row of A (i.e. Ai·) and x (i.e. Ai· • x). The view of the matrix-vector product described above is the row-space perspective, where the term row-space will be given a more rigorous definition at a later time. But there is a very different way of viewing the matrix-vector product based on a column-space perspective. This view uses the notion of the linear combination of a collection of vectors. Given k vectors v^1 , v^2 ,... , vk^ ∈ Rn^ and k scalars α 1 , α 2 ,... , αk ∈ R, we can form the vector α 1 v^1 + α 2 v^2 + · · · + αkvk^ ∈ Rn^.
Any vector of this kind is said to be a linear combination of the vectors v^1 , v^2 ,... , vk^ where the α 1 , α 2 ,... , αk are called the coefficients in the linear combination. The set of all such vectors formed as linear combinations of v^1 , v^2 ,... , vk^ is said to be the linear span of v^1 , v^2 ,... , vk^ and is denoted
span
v^1 , v^2 ,... , vk
α 1 v^1 + α 2 v^2 + · · · + αkvk^
∣ (^) α 1 , α 2 ,... , αk ∈ R }^.
10 2. REVIEW OF MATRICES AND BLOCK STRUCTURES
is CD well defined and if so what is it?
The formula (5) can be used to give further insight into the individual components of the matrix product AB. By the definition of the matrix-vector product we have for each j = 1, 2 ,... , k
AB·j =
A 1 · • B·j A 2 · • B·j Am· • B·j
Consequently, (AB)ij = Ai· • B·j ∀ i = 1, 2 ,... m, j = 1, 2 ,... , k.
That is, the element of AB in the ith row and jth column, (AB)ij , is the dot product of the ith row of A with the jth column of B.
2.1. Elementary Matrices. We define the elementary unit coordinate matrices in Rm×n^ in much the same way as we define the elementary unit coordinate vectors. Given i ∈ { 1 , 2 ,... , m} and j ∈ { 1 , 2 ,... , n}, the elementary unit coordinate matrix Eij ∈ Rm×n^ is the matrix whose ij entry is 1 with all other entries taking the value zero. This is a slight abuse of notation since the notation Eij is supposed to represent the ijth entry in the matrix E. To avoid confusion, we reserve the use of the letter E when speaking of matrices to the elementary matrices.
Exercise 2.2. (Multiplication of square elementary matrices)Let i, k ∈ { 1 , 2 ,... , m} and j, ` ∈ { 1 , 2 ,... , m}. Show the following for elementary matrices in Rm×m^ first for m = 3 and then in general.
(1) Eij Ek` =
Ei` , if j = k, 0 , otherwise. (2) For any α ∈ R, if i 6 = j, then (Im×m − αEij )(Im×m + αEij ) = Im×m so that (Im×m + αEij )−^1 = (Im×m − αEij ). (3) For any α ∈ R with α 6 = 0, (I + (α−^1 − 1)Eii)(I + (α − 1)Eii) = I so that (I + (α − 1)Eii)−^1 = (I + (α−^1 − 1)Eii).
Exercise 2.3. (Elementary permutation matrices)Let i, ∈ { 1 , 2 ,... , m} and consider the matrix Pij ∈ Rm×m obtained from the identity matrix by interchanging its i andth rows. We call such a matrix an elementary permutation matrix. Again we are abusing notation, but again we reserve the letter P for permutation matrices (and, later, for projection matrices). Show the following are true first for m = 3 and then in general.
(1) PiPi = Im×m so that P (^) i− 1 = Pi. (2) P (^) iT = Pi. (3) Pi= I − Eii − E`` + Ei + E`i.
Exercise 2.4. (Three elementary row operations as matrix multiplication)In this exercise we show that the three elementary row operations can be performed by left multiplication by an invertible matrix. Let A ∈ Rm×n, α ∈ R and let i, ` ∈ { 1 , 2 ,... , m} and j ∈ { 1 , 2 ,... , n}. Show that the following results hold first for m = n = 3 and then in general.
(1) (row interchanges) Given A ∈ Rm×n, the matrix Pij A is the same as the matrix A except with the i and jth rows interchanged. (2) (row multiplication) Given α ∈ R with α 6 = 0, show that the matrix (I + (α − 1)Eii)A is the same as the matrix A except with the ith row replaced by α times the ith row of A. (3) Show that matrix Eij A is the matrix that contains the jth row of A in its ith row with all other entries equal to zero. (4) (replace a row by itself plus a multiple of another row) Given α ∈ R and i 6 = j, show that the matrix (I + αEij )A is the same as the matrix A except with the ith row replaced by itself plus α times the jth row of A.
2.2. Associativity of matrix multiplication. Note that the definition of matrix multiplication tells us that this operation is associative. That is, if A ∈ Rm×n, B ∈ Rn×k, and C ∈ Rk×s, then AB ∈ Rm×k^ so that (AB)C is well defined and BC ∈ Rn×s^ so that A(BC) is well defined, and, moreover,
(AB)C =
(AB)C· 1 (AB)C· 2 · · · (AB)C·s
where for each ` = 1, 2 ,... , s
(AB)C·` =
AB· 1 AB· 2 AB· 3 · · · AB·k
= C 1 AB· 1 + C 2AB· 2 + · · · + Ck`AB·k = A
C 1 B· 1 + C 2B· 2 + · · · + Ck`B·k
Therefore, we may write (6) as
(AB)C =
(AB)C· 1 (AB)C· 2 · · · (AB)C·s
A(BC· 1 ) A(BC· 2 )... A(BC·s)
BC· 1 BC· 2... BC·s
Due to this associativity property, we may dispense with the parentheses and simply write ABC for this triple matrix product. Obviously longer products are possible.
Exercise 2.5. Consider the following matrices:
Using these matrices, which pairs can be multiplied together and in what order? Which triples can be multiplied together and in what order (e.g. the triple product BAC is well defined)? Which quadruples can be multiplied together and in what order? Perform all of these multiplications.
Visual inspection tells us that this matrix has structure. But what is it, and how can it be represented? We re-write the the matrix given above blocking out some key structures:
where
B =
Exercise 3.1. Consider the the matrix
Does H have a natural block structure that might be useful in performing a matrix-matrix multiply, and if so describe it by giving the blocks? Describe a conformal block decomposition of the matrix
that would be useful in performing the matrix product HM. Compute the matrix product HM using this conformal decomposition.
Exercise 3.2. Let T ∈ Rm×n^ with T 6 = 0 and let I be the m × m identity matrix. Consider the block structured matrix A = [ I T ].
(i) If A ∈ Rk×s, what are k and s? (ii) Construct a non-zero s × n matrix B such that AB = 0.
The examples given above illustrate how block matrix multiplication works and why it might be useful. One of the most powerful uses of block structures is in understanding and implementing standard matrix factorizations or reductions.
v =
a α b
where a ∈ Rs, α ∈ R, and b ∈ Rt^ with m = s + 1 + t. In this vector we refer to the α entry as the pivot and assume that α 6 = 0. We wish to determine a matrix G such that
Gv = es+
where for j = 1,... , n, ej is the unit coordinate vector having a one in the jth position and zeros elsewhere. We claim that the matrix
G =
Is×s −α−^1 a 0 0 α−^1 0 −α−^1 b It×t
does the trick. Indeed,
(7) Gv =
Is×s −α−^1 a 0 0 α−^1 0 −α−^1 b It×t
a α b
a − a α−^1 α −b + b
(^) = es+1.
14 2. REVIEW OF MATRICES AND BLOCK STRUCTURES
The matrix G is called a Gaussian-Jordan Elimination Matrix, or GJEM for short. Note that G is invertible since
I a 0 0 α 0 0 b I
Moreover, for any vector of the form w =
x 0 y
(^) where x ∈ Rs^ y ∈ Rt, we have
Gw = w.
The GJEM matrices perform precisely the operations required in order to execute Gauss-Jordan elimination. That is, each elimination step can be realized as left multiplication of the augmented matrix by the appropriate GJEM. For example, consider the linear system 2 x 1 + x 2 + 3 x 3 = 5 2 x 1 + 2 x 2 + 4 x 3 = 8 4 x 1 + 2 x 2 + 7 x 3 = 11 5 x 1 + 3 x 2 + 4 x 3 = 10
and its associated augmented matrix
The first step of Gauss-Jordan elimination is to transform the first column of this augmented matrix into the first unit coordinate vector. The procedure described in (7) can be employed for this purpose. In this case the pivot is the (1, 1) entry of the augmented matrix and so
s = 0, a is void, α = 2, t = 3, and b =
which gives
Multiplying these two matrices gives
We now repeat this process to transform the second column of this matrix into the second unit coordinate vector. In this case the (2, 2) position becomes the pivot so that
s = 1, a = 1/ 2 , α = 1, t = 2, and b =
yielding
Again, multiplying these two matrices gives
16 2. REVIEW OF MATRICES AND BLOCK STRUCTURES
Inverse Matrices: The inverse of a matrix X ∈ Rn×n^ is any matrix Y ∈ Rn×n^ such that XY = I in which case we write X−^1 := Y. It is easily shown that if Y is an inverse of X, then Y is unique and Y X = I. Permutation Matrices: A matrix P ∈ Rn×n^ is said to be a permutation matrix if P is obtained from the identity matrix by either permuting the columns of the identity matrix or permuting its rows. It is easily seen that P −^1 = P T^. Unitary Matrices: A matrix U ∈ Rn×n^ is said to be a unitary matrix if U T^ U = I, that is U T^ = U −^1. Note that every permutation matrix is unitary. But the converse is note true since for any vector u with ‖u‖ 2 = 1 the matrix I − 2 uuT^ is unitary. Symmetric Matrices: A matrix M ∈ Rn×n^ is said to be symmetric if M T^ = M. Skew Symmetric Matrices: A matrix M ∈ Rn×n^ is said to be skew symmetric if M T^ = −M.
A 0 =
α 1 vT 1 u 1 Â 1
∈ Rm×n,
with 0 6 = α 1 ∈ R, u 1 ∈ Rn−^1 , v 1 ∈ Rm−^1 , and A˜ 1 ∈ R(m−1)×(n−1). Then using α 1 to zero out u 1 amounts to left multiplication of the matrix A 0 by the Gaussian elimination matrix [ 1 0 − u α^11 I
to get
(8)
− u α^11 I
α 1 v 1 T u 1 Â 1
α 1 vT 1 0 A˜ 1
∈ Rm×n^ ,
where A^ ˜ 1 = Â 1 − u 1 vT 1 /α 1.
Define
L^ ˜ 1 =
u 1 α 1 I
∈ Rm×m^ and U˜ 1 =
α 1 vT 1 0 A˜ 1
∈ Rm×n^.
and observe that
L^ ˜−^1 1 =
− u α^11 I
Hence (8) becomes
(9) L˜− 1 1 Pl 0 A˜ 0 Pr 0 = U˜ 1 , or equivalently, A = Pl 0 L˜ 1 U˜ 1 P (^) rT 0.
Note that L˜ 1 is unit lower triangular (ones on the mail diagonal) and U˜ 1 is block upper-triangular with one nonsingular 1 × 1 block and one (m − 1) × (n − 1) block on the block diagonal. Next consider the matrix A˜ 1 in U˜ 1. If the (1, 1) entry of A˜ 1 is zero, then apply permutation matrices P˜l 1 ∈ R(m−1)×(m−1)^ and P˜r 1 ∈ R(n−1)×(n−1)^ to the left and and right of A˜ 1 ∈ R(m−1)×(n−1), respectively, to bring any non-zero element of A˜ 0 into the (1, 1) position (e.g., the one with largest magnitude) and set A 1 := P˜l 1 A˜ 1 Pr 1. If the element of A˜ 1 is zero, then stop. Define
Pl 1 :=
0 P˜l 1
and Pr 1 :=
0 P˜r 1
so that Pl 1 and Pr 1 are also permutation matrices and
(10) Pl 1 U˜ 1 Pr 1 =
0 P˜l 1
α 1 v 1 T 0 A˜ 1
0 P˜r 1
α 1 v 1 T Pr 1 0 P˜l 1 A˜ 1 Pr 1
α 1 ˜vT 1 0 A 1
where ˜v 1 := P (^) rT 1 v 1. Define
α 1 ˜v 1 T 0 A 1
, where A 1 =
α 2 v 2 T u 2 Â 2
∈ R(m−1)×(n−1),
with 0 6 = α 2 ∈ R, u 2 ∈ Rn−^2 , v 1 ∈ Rm−^2 , and A˜ 2 ∈ R(m−2)×(n−2). In addition, define
P^ ˜l 1 u^1 α 1 I
so that
P (^) lT 1 L 1 =
0 P˜ (^) lT 1
P^ ˜l 1 u^1 α 1 I
u 1 α 1
l 1
u 1 α 1 I
0 P˜ (^) lT 1
= L˜ 1 P (^) lT 1 ,
and consequently
L− 1 1 Pl 1 = Pl 1 L˜− 1 1.
Plugging this into (9) and using (10), we obtain
L− 1 1 Pl 1 Pl 0 A˜ 0 Pr 0 Pr 1 = Pl 1 L˜− 1 1 Pl 0 A˜ 0 Pr 0 Pr 1 = Pl 1 U˜ 1 Pr 1 = U 1 ,
or equivalently,
Pl 1 Pl 0 APr 0 Pr 1 = L 1 U 1. We can now repeat this process on the matrix A 1 since the (1, 1) entry of this matrix is non-zero. The process can run for no more than the number of rows of A which is m. However, it may terminate after k < m steps if the matrix  k is the zero matrix. In either event, we obtain the following result.
Theorem 6.1. [The LU Factorization] Let A ∈ Rm×n. If k = rank (A), then there exist permutation matrices Pl ∈ Rm×m^ and Pr ∈ Rn×n^ such that
PlAPr = LU,
where L ∈ Rm×m^ is a lower triangular matrix having ones on its diagonal and
U =
with U 1 ∈ Rk×k^ a nonsingular upper triangular matrix.
Note that a column permutation is only required if the first column of Aˆk is zero for some k before termination. In particular, this implies that the rank (A) < m. Therefore, if rank (A) = m, column permutations are not required, and Pr = I. If one implements the LU factorization so that a column permutation is only employed in the case when the first column of Aˆk is zero for some k, then we say the LU factorization is obtained through partial pivoting.
Example 6.1. We now use the procedure outlined above to compute the LU factorization of the matrix
for y 1 , then
x = P (^) rT
y 1 0
is a solution to Ax = b. The equation (11) is also easy to solve since U 1 is an upper triangular nonsingular matrix so that (11) can be solved by back substitution.
Exercise 8.1. Given v^1 , v^2 ,... , vk^ ∈ Rn, show that the linear span of these vectors,
span
v^1 , v^2 ,... , vk
α 1 v^1 + α 2 v^2 + · · · + αkvk^
α 1 , α 2 ,... , αk ∈ R
is a subspace.
Exercise 8.2. Show that for any set S in Rn, the set
S⊥^ = {v : wT^ v = 0 for all w ∈ S}
is a subspace. If S is itself a subspace, then S⊥^ is called the subspace orthogonal (or perpendicular) to the subspace S.
Exercise 8.3. If S is any suset of Rn^ (not necessarily a subspace), show that (S⊥)⊥^ = span (S).
Exercise 8.4. If S ⊂ Rn^ is a subspace, show that S = (S⊥)⊥.
A set of vectors v^1 , v^2 ,... , vk^ ∈ Rn^ are said to be linearly independent if 0 = a 1 v^1 + · · · + akvk^ if and only if 0 = a 1 = a 2 = · · · = ak. A basis for a subspace in any maximal linearly independent subspace. An elementary fact from linear algebra is that the subspace equals the linear span of any basis for the subspace and that every basis of a subspace has the same number of vectors in it. We call this number the dimension for the subspace. If S is a subspace, we denote the dimension of S by dim S.
Exercise 8.5. If s ⊂ Rn^ is a subspace, then any basis of S can contain only finitely many vectors.
Exercise 8.6. Show that every subspace can be represented as the linear span of a basis for that subspace.
Exercise 8.7. Show that every basis for a subspace contains the same number of vectors.
Exercise 8.8. If S ⊂ Rn^ is a subspace, show that
(12) Rn^ = S + S⊥
and that
(13) n = dim S + dim S⊥.
Let A ∈ Rm×n. We associate with A its four fundamental subspaces: Ran(A) := {Ax | x ∈ Rn^ } Null(A) := {x | Ax = 0 } Ran(AT^ ) :=
AT^ y
∣ (^) y ∈ Rm^
Null(AT^ ) :=
y
∣ (^) AT^ y = 0
where
rank(A) := dim Ran(A) nullity(A) := dim Null(A) rank(AT^ ) := dim Ran(AT^ ) nullity(AT^ ) := dim Null(AT^ )
Exercise 8.9. Show that the four fundamental subspaces associated with a matrix are indeed subspaces.
20 2. REVIEW OF MATRICES AND BLOCK STRUCTURES
Observe that Null(A) := {x | Ax = 0 } = {x | Ai· • x = 0, i = 1, 2 ,... , m } = {A 1 ·, A 2 ·,... , Am·}⊥ = span (A 1 ·, A 2 ·,... , Am·)⊥ = Ran(AT^ )⊥^.
Since for any subspace S ⊂ Rn, we have (S⊥)⊥^ = S, we obtain
(15) Null(A)⊥^ = Ran(AT^ ) and Null(AT^ ) = Ran(A)⊥.
The equivalences in (15) are called the Fundamental Theorem of the Alternative. One of the big consequences of echelon form is that
(16) n = rank(A) + nullity(A).
By combining (16), (13) and (15), we obtain the equivalence
rank(AT^ ) = dim Ran(AT^ ) = dim Null(A)⊥^ = n − nullity(A) = rank(A).
That is, the row rank of a matrix equals the column rank of a matrix, i.e., the dimensions of the row and column spaces of a matrix are the same!