Docsity
Docsity

Prepare-se para as provas
Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity


Ganhe pontos para baixar
Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium


Guias e Dicas
Guias e Dicas


Métodos de Valores Eigenvales para Otimização: Sistemas Dinâmicos e Algoritmos QR, Notas de estudo de Engenharia Informática

Este documento discute o uso de métodos da teoria de sistemas dinâmicos na otimização, particularmente no contexto dos métodos de valores eigenvales. Ele aborda a convergência de fluxos de gradiente para a função de raio de rayleigh e o uso do método qr para encontrar subespacios de dimensão específica em matrizes simétricas. O documento também discute a relação entre a teoria de sistemas dinâmicos e a teoria dos gramianos, e a importância da estabilidade estrutural de equilíbrios em sistemas dinâmicos.

Tipologia: Notas de estudo

2011

Compartilhado em 06/10/2011

allan-leite-9
allan-leite-9 🇧🇷

3 documentos

1 / 414

Toggle sidebar

Esta página não é visível na pré-visualização

Não perca as partes importantes!

bg1
Optimization
and
Dynamical Systems
Uwe Helmke1
John B. Moore2
2nd Edition
March 1996
1. Department of Mathematics, University of urzburg, D-97074
urzburg, Germany.
2. Department of Systems Engineering and Cooperative Research Centre
for Robust and Adaptive Systems, Research School of Information Sci-
ences and Engineering, Australian National University, Canberra, ACT
0200, Australia.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Pré-visualização parcial do texto

Baixe Métodos de Valores Eigenvales para Otimização: Sistemas Dinâmicos e Algoritmos QR e outras Notas de estudo em PDF para Engenharia Informática, somente na Docsity!

Optimization

and

Dynamical Systems

Uwe Helmke^1

John B. Moore^2

2nd Edition

March 1996

  1. Department of Mathematics, University of W¨urzburg, D- W¨urzburg, Germany.
  2. Department of Systems Engineering and Cooperative Research Centre for Robust and Adaptive Systems, Research School of Information Sci- ences and Engineering, Australian National University, Canberra, ACT 0200, Australia.

Preface

This work is aimed at mathematics and engineering graduate students and researchers in the areas of optimization, dynamical systems, control sys- tems, signal processing, and linear algebra. The motivation for the results developed here arises from advanced engineering applications and the emer- gence of highly parallel computing machines for tackling such applications. The problems solved are those of linear algebra and linear systems the- ory, and include such topics as diagonalizing a symmetric matrix, singular value decomposition, balanced realizations, linear programming, sensitivity minimization, and eigenvalue assignment by feedback control. The tools are those, not only of linear algebra and systems theory, but also of differential geometry. The problems are solved via dynamical sys- tems implementation, either in continuous time or discrete time , which is ideally suited to distributed parallel processing. The problems tackled are indirectly or directly concerned with dynamical systems themselves, so there is feedback in that dynamical systems are used to understand and optimize dynamical systems. One key to the new research results has been the recent discovery of rather deep existence and uniqueness results for the solution of certain matrix least squares optimization problems in geomet- ric invariant theory. These problems, as well as many other optimization problems arising in linear algebra and systems theory, do not always admit solutions which can be found by algebraic methods. Even for such problems that do admit solutions via algebraic methods, as for example the classical task of singular value decomposition, there is merit in viewing the task as a certain matrix optimization problem, so as to shift the focus from algebraic methods to geometric methods. It is in this context that gradient flows on manifolds appear as a natural approach to achieve construction methods that complement the existence and uniqueness results of geometric invari-

Foreword

By Roger W. Brockett

Differential equations have provided the language for expressing many of the important ideas of automatic control. Questions involving optimiza- tion, dynamic compensation and stability are effectively expressed in these terms. However, as technological advances in VLSI reduce the cost of com- puting, we are seeing a greater emphasis on control problems that involve real-time computing. In some cases this means using a digital implemen- tation of a conventional linear controller, but in other cases the additional flexibility that digital implementation allows is being used to create sys- tems that are of a completely different character. These developments have resulted in a need for effective ways to think about systems which use both ordinary dynamical compensation and logical rules to generate feedback signals. Until recently the dynamic response of systems that incorporate if-then rules, branching, etc. has not been studied in a very effective way. From this latter point of view it is useful to know that there exist families of ordinary differential equations whose flows are such as to generate a sorting of the numerical values of the various components of the initial conditions, solve a linear programming problem, etc. A few years ago, I observed that a natural formulation of a steepest descent algorithm for solving a least-squares matching problem leads to a differential equation for carrying out such operations. During the course of some conversation’s with Anthony Bloch it emerged that there is a simplified version of the matching equations, obtained by recasting them as flows on the Lie algebra and then restricting them to a subspace, and that this simplified version can be used to sort lists as well. Bloch observed that the restricted equations are identical to the Toda lattice equations in the form introduced by Hermann

vi Foreword

Flaschka nearly twenty years ago. In fact, one can see in J¨urgen Moser’s early paper on the solution of the Toda Lattice, a discussion of sorting couched in the language of scattering theory and presumably one could have developed the subject from this, rather different, starting point. The fact that a sorting flow can be viewed as a gradient flow and that each simple Lie algebra defines a slightly different version of this gradient flow was the subject of a systematic analysis in a paper that involved collaboration with Bloch and Tudor Ratiu. The differential equations for matching referred to above are actually formulated on a compact matrix Lie group and then rewritten in terms of matrices that evolve in the Lie algebra associated with the group. It is only with respect to a particular metric on a smooth submanifold of this Lie algebra (actually the manifold of all matrices having a given set of eigen- values) that the final equations appear to be in gradient form. However, as equations on this manifold, the descent equations can be set up so as to find the eigenvalues of the initial condition matrix. This then makes contact with the subject of numerical linear algebra and a whole line of interesting work going back at least to Rutishauser in the mid 1950s and continuing to the present day. In this book Uwe Helmke and John Moore emphasize prob- lems, such as computation of eigenvalues, computation of singular values, construction of balanced realizations, etc. involving more structure than just sorting or solving linear programming problems. This focus gives them a natural vehicle to introduce and interpret the mathematical aspects of the subject. A recent Harvard thesis by Steven Smith contains a detailed discussion of the numerical performance of some algorithms evolving from this point of view. The circle of ideas discussed in this book have been developed in some other directions as well. Leonid Faybusovich has taken a double bracket equation as the starting point in a general approach to interior point meth- ods for linear programming. Wing Wong and I have applied this type of thinking to the assignment problem, attempting to find compact ways to formulate a gradient flow leading to the solution. Wong has also exam- ined similar methods for nonconvex problems and Jeffrey Kosowsky has compared these methods with other flow methods inspired by statistical physics. In a joint paper with Bloch we have investigated certain partial differential equation models for sorting continuous functions, i.e. generating the monotone equi-measurable rearrangements of functions, and Saveliev has examined a family of partial differential equations of the double bracket type, giving them a cosmological interpretation. It has been interesting to see how rapidly the literature in this area has grown. The present book comes at a good time, both because it provides a well reasoned introduction to the basic ideas for those who are curious

Contents

Preface iii

  • 1 Matrix Eigenvalue Methods Foreword v
    • 1.1 Introduction
    • 1.2 Power Method for Diagonalization
    • 1.3 The Rayleigh Quotient Gradient Flow
    • 1.4 The QR Algorithm
    • 1.5 Singular Value Decomposition (SVD)
    • 1.6 Standard Least Squares Gradient Flows
  • 2 Double Bracket Isospectral Flows
    • 2.1 Double Bracket Flows for Diagonalization
    • 2.2 Toda Flows and the Riccati Equation
    • 2.3 Recursive Lie-Bracket Based Diagonalization
  • 3 Singular Value Decomposition
    • 3.1 SVD via Double Bracket Flows
    • 3.2 A Gradient Flow Approach to SVD
  • 4 Linear Programming
    • 4.1 The Rˆole of Double Bracket Flows
    • 4.2 Interior Point Flows on a Polytope
    • 4.3 Recursive Linear Programming/Sorting
  • 5 Approximation and Control
    • 5.1 Approximations by Lower Rank Matrices x Contents
    • 5.2 The Polar Decomposition
    • 5.3 Output Feedback Control
  • 6 Balanced Matrix Factorizations
    • 6.1 Introduction
    • 6.2 Kempf-Ness Theorem
    • 6.3 Global Analysis of Cost Functions
    • 6.4 Flows for Balancing Transformations
    • 6.5 Flows on the Factors X and Y
    • 6.6 Recursive Balancing Matrix Factorizations
  • 7 Invariant Theory and System Balancing
    • 7.1 Introduction
    • 7.2 Plurisubharmonic Functions
    • 7.3 The Azad-Loeb Theorem
    • 7.4 Application to Balancing
    • 7.5 Euclidean Norm Balancing
  • 8 Balancing via Gradient Flows
    • 8.1 Introduction
    • 8.2 Flows on Positive Definite Matrices
    • 8.3 Flows for Balancing Transformations
    • 8.4 Balancing via Isodynamical Flows
    • 8.5 Euclidean Norm Optimal Realizations
  • 9 Sensitivity Optimization
    • 9.1 A Sensitivity Minimizing Gradient Flow
    • 9.2 Related L^2 -Sensitivity Minimization Flows
    • 9.3 Recursive L^2 -Sensitivity Balancing
    • 9.4 L^2 -Sensitivity Model Reduction
    • 9.5 Sensitivity Minimization with Constraints
  • A Linear Algebra
    • A.1 Matrices and Vectors
    • A.2 Addition and Multiplication of Matrices
    • A.3 Determinant and Rank of a Matrix
    • A.4 Range Space, Kernel and Inverses
    • A.5 Powers, Polynomials, Exponentials and Logarithms
    • A.6 Eigenvalues, Eigenvectors and Trace
    • A.7 Similar Matrices
    • A.8 Positive Definite Matrices and Matrix Decompositions
    • A.9 Norms of Vectors and Matrices Contents xi
    • A.10 Kronecker Product and Vec
    • A.11 Differentiation and Integration
    • A.12 Lemma of Lyapunov
    • A.13 Vector Spaces and Subspaces
    • A.14 Basis and Dimension
    • A.15 Mappings and Linear Mappings
    • A.16 Inner Products
  • B Dynamical Systems
    • B.1 Linear Dynamical Systems
    • B.2 Linear Dynamical System Matrix Equations
    • B.3 Controllability and Stabilizability
    • B.4 Observability and Detectability
    • B.5 Minimality
    • B.6 Markov Parameters and Hankel Matrix
    • B.7 Balanced Realizations
    • B.8 Vector Fields and Flows
    • B.9 Stability Concepts
    • B.10 Lyapunov Stability
  • C Global Analysis
    • C.1 Point Set Topology
    • C.2 Advanced Calculus
    • C.3 Smooth Manifolds
    • C.4 Spheres, Projective Spaces and Grassmannians
    • C.5 Tangent Spaces and Tangent Maps
    • C.6 Submanifolds
    • C.7 Groups, Lie Groups and Lie Algebras
    • C.8 Homogeneous Spaces
    • C.9 Tangent Bundle
    • C.10 Riemannian Metrics and Gradient Flows
    • C.11 Stable Manifolds
    • C.12 Convergence of Gradient Flows
    • References
    • Author Index
    • Subject Index

2 Chapter 1. Matrix Eigenvalue Methods

the influence of the ideas of Hermann (1979) on the development of applica- tions of differential geometry in systems theory and linear algebra. Brockett showed that the tasks of diagonalizing a matrix, linear programming, and sorting, could all be solved by dynamical systems, and in particular by finding the limiting solution of certain well behaved ordinary matrix dif- ferential equations. Moreover, these construction procedures were actually mildly disguised solutions to matrix least squares minimization problems. Of course, we are not used to solving problems in linear algebra by cal- culus, nor does it seem that matrix differential equations are attractive for replacing linear algebra computer packages. So why proceed along such lines? Here we must look at the cutting edge of current applications which result in matrix formulations involving quite high dimensional matrices, and the emergent computer technologies with distributed and parallel pro- cessing such as in the connection machine, the hypercube, array processors, systolic arrays, and artificial neural networks. For such “neural” network architectures, the solutions of high order nonlinear matrix differential (or difference) equations is not a formidable task, but rather a natural one. We should not exclude the possibility that new technologies, such as charge coupled devices, will allow N digital additions to be performed simultane- ously rather than in N operations. This could bring about a new era of numerical methods, perhaps permitting the dynamical systems approach to optimization explored here to be very competitive. The subject of this book is currently in an intensive state of develop- ment, with inputs coming from very different directions. Starting from the seminal work of Khachian and Karmarkar, there has been a lot of progress in developing interior point algorithms for linear programming and nonlin- ear programming, due to Bayer, Lagarias and Faybusovich, to mention a few. In numerical analysis there is the work of Kostant, Symes, Deift, Chu, Tomei and others on the Toda flow and its connection to completely inte- grable Hamiltonian systems. This subject also has deep connections with torus actions and symplectic geometry. Starting from the work of Brockett, there is now an emerging theory of completely integrable gradient flows on manifolds which is developed by Bloch, Brockett, Flaschka and Ratiu. We also mention the work of Bloch on least squares estimation with relation to completely integrable Hamiltonian systems. In our own work we have tried to develop the applications of gradient flows on manifolds to systems theory, signal processing and control theory. In the future we expect more applications to optimal control theory. We also mention the obvious con- nections to artificial neural networks and nonlinear approximation theory. In all these research directions, the development is far from being complete and a definite picture has not yet appeared. It has not been our intention, nor have we been able to cover thoroughly

1.1. Introduction 3

all these recent developments. Instead, we have tried to draw the emerging interconnections between these different lines of research and to raise the reader’s interests in these fascinating developments. Our window on these developments, to which we invite the reader to share, is of course our own research of recent years. We see then that a dynamical systems approach to optimization is rather timely. Where better to start than with least squares optimization? The first step for the approach we take is to formulate a cost function which when minimized over the constraint set would give the desired result. The next step is to formulate a Riemannian metric on the tangent space of the constraint set, viewed as a manifold, such that a gradient flow on the manifold can be readily implemented, and such that the flow converges to the desired algebraic solution. We do not offer a systematic approach for achieving any “best” selections of the metric, but rather demonstrate the approach by examples. In the first chapters of the monograph, these examples will be associated with fundamental and classical tasks in linear algebra and in linear system theory, therefore representing more subtle, rather than dramatic, advances. In the later chapters new problems, not previously addressed by any complete theory, are tackled. Of course, the introduction of methods from the theory of dynamical sys- tems to optimization is well established, as in modern analysis of the clas- sical steepest descent gradient techniques and the Newton method. More recently, feedback control techniques are being applied to select the step size in numerical integration algorithms. There are interesting applications of optimization theory to dynamical systems in the now well established field of optimal control and estimation theory. This book seeks to catalize further interactions between optimization and dynamical systems. We are familiar with the notion that Riccati equations are often the dynamical systems behind many least squares optimization tasks, and en- gineers are now comfortable with implementing Riccati equations for esti- mation and control. Dynamical systems for other important matrix least squares optimization tasks in linear algebra, systems theory, sensitivity optimization, and inverse eigenvalue problems are studied here. At first en- counter these may appear quite formidable and provoke caution. On closer inspection, we find that these dynamical systems are actually Riccati-like in behaviour and are often induced from linear flows. Also, it is very com- forting that they are exponentially convergent, and converge to the set of global optimal solutions to the various optimization tasks. It is our predic- tion that engineers will become familiar with such equations in the decades to come. To us the dynamical systems arising in the various optimization tasks studied in this book have their own intrinsic interest and appeal. Although

1.2. Power Method for Diagonalization 5

between control energy and regulation or tracking performance. Quadratic indices are also used for eigenvalue assignment. For all these various tasks, the geometry of the constraint manifolds is important and gradient flows on these manifolds are developed. Digres- sions are included in the early chapters to cover such topics as Projective Spaces, Riemannian Metrics, Gradient Flows, and Lie groups. Appendices are included to cover the relevant basic definitions and results in linear al- gebra, dynamical systems theory, and global analysis including aspects of differential geometry.

1.2 Power Method for Diagonalization

In this chapter we review some of the standard tools in numerical linear algebra for solving matrix eigenvalue problems. Our interest in such meth- ods is not to give a concise and complete analysis of the algorithms, but rather to demonstrate that some of these algorithms arise as discretizations of certain continuous-time dynamical systems. This work then leads into the more recent matrix eigenvalue methods of Chapter 2, termed double bracket flows, which also have application to linear programming and topics of later chapters. There are excellent textbooks available where the following standard methods are analyzed in detail; one choice is Golub and Van Loan (1989).

The Power Method

The power method is a particularly simple iterative procedure to determine a dominant eigenvector of a linear operator. Its beauty lies in its simplic- ity rather than its computational efficiency. Appendix A gives background material in matrix results and in linear algebra. Let A : Cn^ → Cn^ be a diagonalizable linear map with eigenvalues λ 1 ,... , λn and eigenvectors v 1 ,... , vn. For simplicity let us assume that A is nonsingular and λ 1 ,... , λn satisfy |λ 1 | > |λ 2 | ≥ · · · ≥ |λn|. We then say that λ 1 is a dominant eigenvalue and v 1 a dominant eigenvector. Let

‖x‖ =

( (^) ∑n

i=

|xi|^2

denote the standard Euclidean norm of Cn. For any initial vector x 0 of Cn with ‖x 0 ‖ = 1, we consider the infinite normalized Krylov-sequence (xk) of

6 Chapter 1. Matrix Eigenvalue Methods

unit vectors of Cn^ defined by the discrete-time dynamical system

xk =

Axk− 1 ‖Axk− 1 ‖

Akx 0 ‖Akx 0 ‖ , k ∈ N. (2.2)

Appendix B gives some background results for dynamical systems. Since the growth rate of the component of Akx 0 corresponding to the eigenvector v 1 dominates the growth rates of the other components we would expect that (xk) converges to the dominant eigenvector of A:

lim k→∞ xk = λ

v 1 ‖v 1 ‖

for some λ ∈ C with |λ| = 1. (2.3)

Of course, if x 0 is an eigenvector of A so is xk for all k ∈ N. Therefore we would expect (2.3) to hold only for generic initial conditions, that is for almost all x 0 ∈ Cn. This is indeed quite true; see Golub and Van Loan (1989), Parlett and Poole (1973).

Example 2.1 Let

A =
[
]

with x 0 =

[
]

Then

xk =

1 + 2^2 k

[

2 k

]

, k ≥ 1 ,

which converges to

[ 0

1

]

for k → ∞.

Example 2.2 Let

A =
[
]

, x 0 =

[
]

Then

xk =

1 + 2^2 k

[

(−2)k

]

, k ∈ N,

and the sequence (xk | k ∈ N) has

{[ 0 1

]
[
]}

as a limit set, see Figure 2.1.

8 Chapter 1. Matrix Eigenvalue Methods

IR IP ^ C IP

.



FIGURE 2.2. Circle RP^1 and Riemann Sphere CP^1

familiar Riemann sphere C ∪ {∞} consisting of the complex plane and the point at infinity, as illustrated in Figure 2.2. Since any one-dimensional complex subspace of Cn+1^ is generated by a unit vector in Cn+1, and since any two unit row vectors z = (z 0 ,... , zn), w = (w 0 ,... , wn) of Cn+1^ generate the same complex line if and only if

(w 0 ,... , wn) = (λz 0 ,... , λzn) for some λ ∈ C, |λ| = 1, one can identify CPn^ with the set of equiva- lence classes [z 0 : · · · : zn] = {(λz 0 ,... , λzn) | λ ∈ C, |λ| = 1} for unit vec- tors (z 0 ,... , zn) of Cn+1. Here z 0 ,... , zn are called the homogeneous coordinates for the complex line [z 0 : · · · : zn]. Similarly, we denote by [z 0 : · · · : zn] the complex line which is, generated by an arbitrary nonzero vector (z 0 ,... , zn) ∈ Cn+1. Now , let H 1 (n + 1) denote the set of all one-dimensional Hermitian pro- jection operators on Cn+1. Thus H ∈ H 1 (n + 1) if and only if H = H∗, H^2 = H, rank H = 1. (2.5)

By the spectral theorem every H ∈ H 1 (n + 1) is of the form H = x · x∗^ for a unit column vector x = (x 0 ,... , xn)′^ ∈ Cn+1. The map f : H 1 (n + 1) →CPn H →Image of H = [x 0 : · · · : xn]

(2.6)

is a bijection and we can therefore identify the set of rank one Hermitian projection operators on Cn+1^ with the complex projective space CPn. This is what we refer to as the isospectral picture of the projective space. Note

1.2. Power Method for Diagonalization 9

that this parametrization of the projective space is not given as a collection of local coordinate charts but rather as a global algebraic representation. For our purposes such global descriptions are of more interest than the local coordinate chart descriptions. Similarly, the complex Grassmann manifold GrassC (k, n + k) is defined as the set of all k-dimensional complex linear subspaces of Cn+k. If Hk (n + k) denotes the set of all Hermitian projection operators H of Cn+k^ with rank k

H = H∗, H^2 = H, rank H = k,

then again, by the spectral theorem, every H ∈ Hk (n + k) is of the form H = X · X∗^ for a complex (n + k) × k-matrix X satisfying X∗X = Ik. The map

f : Hk (n + k) → GrassC (k, n + k) (2.7)

defined by

f (H) = image (H) = column space of X (2.8)

is a bijection. The spaces GrassC (k, n + k) and CPn^ = GrassC (1, n + 1) are compact complex manifolds of (complex) dimension kn and n, respectively. Again, we refer to this as the isospectral picture of Grassmann manifolds. In the same way the real Grassmann manifolds GrassR (1, n + 1) = RPn and GrassR (k, n + k) are defined; i.e. GrassR (k, n + k) is the set of all k- dimensional real linear subspaces of Rn+k^ of dimension k.

Power Method as a Dynamical System

We can now describe the power method as a dynamical system on the complex projective space CPn−^1. Given a complex linear operator A : Cn^ → Cn^ with det A = 0, it induces a map on the complex projective space CPn−^1 denoted also by A:

A : CPn−^1 →CPn−^1 ,  →A · ,

which maps every one-dimensional complex vector space  ⊂ Cn^ to the image A ·  of  under A. The main convergence result on the power method can now be stated as follows:

Theorem 2.3 Let A be diagonalizable with eigenvalues λ 1 ,... , λn satis- fying |λ 1 | > |λ 2 | ≥ · · · ≥ |λn|. For almost all complex lines  0 ∈ CPn−^1