Matrix Operations and Orthogonality, Lecture notes of Law

Various properties of matrix multiplication, including commutativity, associativity, distributivity, and the relationship between the column space of a matrix and the product of two matrices. It also discusses the concept of orthogonal matrices and their orthogonal complement. examples and proofs.

Typology: Lecture notes

2021/2022

Uploaded on 09/12/2022

hambery
hambery 🇺🇸

4.2

(12)

269 documents

1 / 62

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Chapter 2
Matrices and Linear Algebra
2.1 Basics
Definition 2.1.1. Amatrix is an m×narray of scalars from a given field
F. The individual values in the matrix are called entries.
Examples.
A=213
124
B=12
34
The size of the array is–written as m×n,where
m×n

number of rows number of columns
Notation
A=
a11 a12 ... a
1n
a21 a22 ... a
2n
an1an2... a
mn
←− rows

columns
A:= uppercase denotes a matrix
a:= lower case denotes an entry of a matrix aF.
Special matrices
33
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e

Partial preview of the text

Download Matrix Operations and Orthogonality and more Lecture notes Law in PDF only on Docsity!

Chapter 2

Matrices and Linear Algebra

2.1 Basics

Definition 2.1.1. A matrix is an m × n array of scalars from a given field F. The individual values in the matrix are called entries.

Examples.

A =

^

B =

^

The size of the array is–written as m × n, where

m × n c A number of rows number of columns

Notation

A =

a 11 a 12... a 1 n

a 21 a 22... a 2 n

an 1 an 2... amn

A

←− rows t

A A c columns

A := uppercase denotes a matrix a := lower case denotes an entry of a matrix a ∈ F.

Special matrices

34 CHAPTER 2. MATRICES AND LINEAR ALGEBRA

(1) If m = n, the matrix is called square. In this case we have

(1a) A matrix A is said to be diagonal if

aij = 0 i W= j.

(1b) A diagonal matrix A may be denoted by diag(d 1 , d 2 ,... , dn) where

aii = di aij = 0 j W= i.

The diagonal matrix diag(1, 1 ,... , 1) is called the identity matrix and is usually denoted by

In =

or simply I, when n is assumed to be known. 0 = diag(0,... , 0) is called the zero matrix. (1c) A square matrix L is said to be lower triangular if

fij = 0 i < j.

(1d) A square matrix U is said to be upper triangular if

uij = 0 i > j.

(1e) A square matrix A is called symmetric if

aij = aji.

(1f) A square matrix A is called Hermitian if

aij = ¯aji (¯z := complex conjugate of z).

(1g) Eij has a 1 in the (i, j) position and zeros in all other positions.

(2) A rectangular matrix A is called nonnegative if

aij ≥ 0 all i, j.

It is called positive if

aij > 0 all i, j.

Each of these matrices has some special properties, which we will study during this course.

36 CHAPTER 2. MATRICES AND LINEAR ALGEBRA

(5) α(βA) = αβA

(6) 0 A = 0

(7) α 0 = 0.

Definition 2.1.6. If x and y ∈ Rn,

x = (x 1... xn) y = (y 1... yn).

Then the scalar or dot product of x and y is given by

x, yX =

3 n

i=

xiyi.

Remark 2.1.1. (i) Alternate notation for the scalar product: x, yX = x · y. (ii) The dot product is defined only for vectors of the same length.

Example 2.1.1. Let x = (1, 0 , 3 , −1) and y = (0, 2 , − 1 , 2) then x, yX = 1(0) + 0(2) + 3(−1) − 1(2) = −5.

Definition 2.1.7. If A is m × n and B is n × p. Let ri(A) denote the vector with entries given by the ith^ row of A, and let cj (B) denote the vector with entries given by the jth^ row of B. The product C = AB is the m × p matrix defined by

cij = ri(A), cj (B)X

where ri(A) is the vector in Rn consisting of the ith^ row of A and similarly cj (B) is the vector formed from the jth^ column of B. Other notation for C = AB

cij =

n k=

aikbkj 1 ≤ i ≤ m

1 ≤ j ≤ p.

Example 2.1.2. Let

A =

]

and B =

Then

AB =

]

2.1. BASICS 37

Properties of matrix multiplication

(1) If AB exists, does it happen that BA exists and AB = BA? The answer is usually no. First AB and BA exist if and only if A ∈ Mm,n(F ) and B ∈ Mn,m(F ). Even if this is so the sizes of AB and BA are different (AB is m × m and BA is n × n) unless m = n. However even if m = n we may have AB W= BA. See the examples below. They may be different sizes and if they are the same size (i.e. A and B are square) the entries may be different

A = [1, 2] B =

]

AB = [1]

BA =

]

A =

]

B =

]

AB =

]

BA =

]

(2) If A is square we define

A^1 =^ A,^ A^2 =^ AA,^ A^3 =^ A^2 A^ =^ AAA An^ = An−^1 A = A · · · A (n factors).

(3) I = diag(1,... , 1). If A ∈ Mm,n(F ) then

AIn = A and ImA = A.

Theorem 2.1.3 (Matrix Multiplication Rules). Assume A, B, and C are matrices for which all products below make sense. Then

(1) A(BC) = (AB)C

(2) A(B ± C) = AB ± AC and (A ± B)C = AC ± BC

(3) AI = A and IA = A

(4) c(AB) = (cA)B

(5) A0 = 0 and 0 B = 0

2.1. BASICS 39

(6) If A is Hermitian

A = A∗.

More facts about symmetry.

Proof. (1) We know (AT^ )ij = aji. So ((AT^ )T^ )ij = aij. Thus (AT^ )T^ = A.

(2) (A ± B)T^ = aji ± bji. So (A ± B)T^ = AT^ ± BT^.

Proposition 2.1.1. (1) A is symmetric if and only if AT^ is symmetric.

(1)∗^ A is Hermitian if and only if A∗^ is Hermitian.

(2) If A is symmetric, then A^2 is also symmetric.

(3) If A is symmetric, then An^ is also symmetric for all n.

Definition 2.1.9. A matrix is called skew-symmetric if

AT^ = −A.

Example 2.1.4. The matrix

A =

is skew-symmetric.

Theorem 2.1.5. (1) If A is skew symmetric, then A is a square matrix and aii = 0, i = 1,... , n.

(2) For any matrix A ∈ Mn(F )

A − AT

is skew-symmetric while A + AT^ is symmetric.

(3) Every matrix A ∈ Mn(F ) can be uniquely written as the sum of a skew-symmetric and symmetric matrix.

Proof. (1) If A ∈ Mm,n(F ), then AT^ ∈ Mn,m(F ). So, if AT^ = −A we must have m = n. Also

aii = −aii

for i = 1,... , n. So aii = 0 for all i.

40 CHAPTER 2. MATRICES AND LINEAR ALGEBRA

(2) Since (A − AT^ )T^ = AT^ − A = −(A − AT^ ), it follows that A − AT^ is skew-symmetric.

(3) Let A = B + C be a second such decomposition. Subtraction gives

1 2

(A + AT^ ) − B = C −

(A − AT^ ).

The left matrix is symmetric while the right matrix is skew-symmetric. Hence both are the zero matrix.

A =

(A + AT^ ) +

(A − AT^ ).

Examples. A =

J 0 − 1

1 0

o is skew-symmetric. Let

B =

]

BT^ =

]

B − BT^ =

]

B + BT^ =

]

Then

B =

(B − BT^ ) +

(B + BT^ ).

An important observation about matrix multiplication is related to ideas from vector spaces. Indeed, two very important vector spaces are associated with matrices.

Definition 2.1.10. Let A ∈ Mm,n(C). (i)Denote by

cj (A) := jth^ column of A

cj (A) ∈ Cm. We call the subspace of Cm spanned by the columns of A the column space of A. With c 1 (A) ,... , cn (A) denoting the columns of A

42 CHAPTER 2. MATRICES AND LINEAR ALGEBRA

2.2 Linear Systems

The solutions of linear systems is likely the single largest application of ma- trix theory. Indeed, most reasonable problems of the sciences and economics that have the need to solve problems of several variable almost without ex- ception are reduced to component parts where one of them is the solution of a linear system. Of course the entire solution process may have the linear system solver as a relatively small component, but an essential one. Even the solution of nonlinear problems, especially, employ linear systems to great and crucial advantage. To be precise, we suppose that the coefficients aij , 1 ≤ i ≤ m and 1 ≤ j ≤ n and the data bj , 1 ≤ j ≤ m are known. We define the linear system for the n unknowns x 1 ,... , xn to be

a 11 x 1 + a 12 x 2 + · · · + a 1 nxn = b 1 a 21 x 1 + a 22 x 2 + · · · + a 2 nxn = b 2 (∗)

am 1 x 1 + am 2 x 2 + · · · + amnxn = bm

The solution set is defined to be the subset of Rn of vectors (x 1 ,... , xn) that satisfy each of the m equations of the system. The question of how to solve a linear system includes a vast literature of theoretical and computation methods. Certain systems form the model of what to do. In the systems below we note that the first one has three highly coupled (interrelated) variables.

3 x 1 − 2 x 2 + 4x 3 = 7 x 1 − 6 x 2 − 2 x 3 = 0 −x 1 + 3x 2 + 6x 3 = − 2

The second system is more tractable because there appears even to the untrained eye a clear and direct method of solution.

3 x 1 − 2 x 2 − x 3 = 7 x 2 − 2 x 3 = 1 2 x 3 = − 2

Indeed, we can see right off that x 3 = − 1. Substituting this value into the second equation we obtain x 2 = 1 − 2 = − 1. Substituting both x 2 and x 3 into the first equation, we obtain 2x 1 − 2 (−1) − (−1) = 7, gives x 1 = 2. The

2.2. LINEAR SYSTEMS 43

solution set is the vector (2, − 1 , −1). The virtue of the second system is that the unknowns can be determined one-by-one, back substituting those already found into the next equation until all unknowns are determined. So if we can convert the given system of the first kind to one of the second kind, we can determine the solution. This procedure for solving linear systems is therefore the applications of operations to effect the gradual elimination of unknowns from the equations until a new system results that can be solved by direct means. The oper- ations allowed in this process must have precisely one important property: They must not change the solution set by either adding to it or subtracting from it. There are exactly three such operations needed to reduce any set of linear equations so that it can be solved directly.

(E1) Interchange two equations.

(E2) Multiply any equation by a nonzero constant.

(E3) Add a multiple of one equation to another.

This can be summarized in the following theorem

Theorem 2.2.1. Given the linear system (). The set of equation opera- tions E1, E2, and E3 on the equations of () does not alter the solution set of the system (*).

We leave this result to the exercises. Our main intent is to convert these operations into corresponding operations for matrices. Before we do this we clarify which linear systems can have a soltution. First, the system can be converted to matrix form by setting A equal to the m × n matrix of coefficients, b equal to the m × 1 vector of data, and x equal to the n × 1 vector of unknowns. Then the system (*) can be written as

Ax = b

In this way we see that with ci (A) denoting the ith^ column of A, the system is expressible as

x 1 c 1 (A) + · · · + xncn (A) = b

From this equation it is clear that the system has a solution if and only if the vector b is in S (c 1 (A) , · · · , cn (A)). This is summarized in the following theorem.

2.2. LINEAR SYSTEMS 45

Notation: Ri ↔ Rj

Type 2

E 2 =

......... c......... .. . 1 .. .

row i

column i

Notation: cRi

Type 3

E 3 =

...... c............ .. . 1 .. . 1

row j

column i

Notation: cRi + Rj , the abbreviated form of cRi + Rj → Rj

Example 2.2.1. The operations

 R 1 ←→ R 2

 4 R 3

46 CHAPTER 2. MATRICES AND LINEAR ALGEBRA

can also be realized as

R 1 ←→ R 2 :

4 R 3 :

The operations

 −^3 R^1 +^ R^2

2 R 1 + R 3

can be realized by the left matrix multiplications  

Note there are two matrix multiplications them, one for each Type 3 ele- mentary operation.

Row-reduced echelon form. To each A ∈ Mm,n(E) there is a canonical form also in Mm,n(E) which may be obtained by row operations. Called the RREF, it has the following properties.

(a) Each nonzero row has a 1 as the first nonzero entry (:= leading one).

(b) All column entries above and below a leading one are zero.

(c) All zero rows are at the bottom.

(d) The leading one of one row is to the left of leading ones of all lower rows.

Example 2.2.2.

B =

 is in RREF.

48 CHAPTER 2. MATRICES AND LINEAR ALGEBRA

the other rows. Assume therefore that the RREF is unique if the number of columns is less than n. Assume there are two RREF forms, B 1 and B 2 for A. Now the RREF of A is therefore unique through the (n − 1)st columns. The only difference between the RREF’s B 1 and B 2 must occur in the nth^ column. Now proceed by induction on the number of nonzero rows. Assume that A W= 0. If A has just one row, the RREF of A is simply the scalar multiple of A that makes the first nonzero column entry a one. Thus it is unique. If A = 0, the RREF is also zero. Assume now that the RREF is unique for matrices with less than m rows. By the comments above that the only difference between the RREF’s B 1 and B 2 can occur at the (m, n)-entry. That is (B 1 )m,n W= (B 2 )m,n. They are therefore not leading ones. (Why?) There is a leading one in the mth^ row, however, because it is a non zero row. Because the row spaces of B 1 and B 2 are identical, this results in a contradiction, and therefore the (m, n)-entries must be equal. Finally, B 1 = B 2. This completes the induction. (Alternatively, the two systems pertaining to the RREF’s must have the same solution set to the system Ax = 0. With (B 1 )m,n W= (B 2 )m,n, it is easy to see that the solution sets to B 1 x = 0 and B 2 x = 0 must differ.) ¤

Definition 2.2.2. Let A ∈ Mm,n and b ∈ Rm (or Cn). Define

[A|b] =

a 11... a 1 n b 1 a 21... a 2 n b 2 am 1... amn bm

[A|b] is called the augmented matrix of A by b. [A|b] ∈ Mm,n+1(F ). The augmented matrix is a useful notation for finding the solution of systems using row operations.

Identical to other definitions for solutions of equations, the equivalence of two systems is defined via the idea of equality of the solution set.

Definition 2.2.3. Two linear systems Ax = b and Bx = c are called equiv- alent if one can be converted to the other by elementary equation opera- tions.

It is easy to see that this implies the following

Theorem 2.2.4. Two linear systems Ax = b and Bx = c are equivalent if and only if both [A|b] and [B|c] have the same row reduced echelon form.

We leave the prove to the reader. (See Exercise 23.) Note that the solution set need not be a single vector; it can be null or infinite.

2.3. RANK 49

2.3 Rank

Definition 2.3.1. The rank of any matrix A, denote by r(A), is the di- mension of its column space.

Proposition 2.3.1. (i) The rank of A equals the number of nonzero rows of the RREF of A, i.e. the number of leading ones. (ii) r(A) = r(AT^ ).

Proof. (i) Follows from previous results. (ii) The number of linearly independent rows equals the number of lin- early independent columns. The number of linearly independent rows is the number of linearly independent columns of AT^ –by definition. Hence r(A) = r(AT^ ).

Proposition 2.3.2. Let A ∈ Mm,n(C) and b ∈ Cm. Then Ax = b has a solution if and only if r(A) = r([A|b]), where [A|b] is the augmented matrix.

Remark 2.3.1. Solutions may exist and may not. However, even if a so- lution exists, it may not be unique. Indeed if it is not unique, there is an infinity of solutions.

Definition 2.3.2. When Ax = b has a solution we say the system is con- sistent.

Naturally, in practical applications we want our systems to be consistent. When they are not, this can be an indicator that something is wrong with the underlying physical model. In mathematics, we also want consistent systems; they are usually far more interesting and offer richer environments for study. In addition to the column and row spaces, another space of great impor- tance is the so-called null space, the set of vectors x ∈ Rn for which Ax = 0. In contrast, when solving the simple single variable linear equation ax = b with a W= 0 we know there is always a unique solution x = b/a. In solving even the simplest higher dimensional systems, the picture is not as clear.

Definition 2.3.3. Let A ∈ Mm,n(F ). The null space of A is defined to be

Null(A) = {x ∈ Rn | Ax = 0}.

It is a simple consequence of the linearity of matrix multiplication that Null(A) is a linear subspace of Rn. That is to say, Null(A) is closed under vector addition and scalar multiplication. In fact, A(x + y) = Ax + Ay = 0 + 0 = 0, if x, y ∈ Null(A). Also, A(αx) = αAx = 0, if x ∈ Null(A). We state this formally as

2.3. RANK 51

Proof. The equivalence of (a), (b), (c) and (d) follow from previous con- siderations. To establish (e), let S = {cf 1 , cf 2 ,... , cfk } denote the linearly independent column vectors of A. Let T = {ef 1 , ef 2 ,... , efk } ⊂ Rn be the standard vectors. Then Aefj = cfj. If b ∈ S(S), then b = a 1 cf 1 + a 2 cf 2 + · · · + akcfk. A solution to Ax = b is given by x = a 1 ef 1 + a 2 ef 2 + · · · + akefk. Conversely, if (e) holds, then the set S must be linearly independent for otherwise S could be reduced to k − 1 or fewer vectors. Similarly if A has k + 1 linearly independent columns then set S can be expanded. Therefore, the column space of A must have exactly k vectors. To prove (f) we assume that S = {v 1 ,... , vk} is a basis for the column space of A. Let T = {w 1 ,... , wk} ⊂ Rn for which Awi = vi, i = 1,... , k. By our extension theorem, we select n − k vectors wk+1,... , wn such that U = {w 1 ,... , wk, wk+1,... , wn} is a basis of Rn. We must have that Awk+1 ∈ S(S). Hence there are scalars b 1 ,... , bk such that

Awk+1 = A(b 1 w 1 + · · · + bkwk)

and thus wI k+1 = wk+1 − (b 1 w 1 + · · · + bkwk) is in the null space of A. Repeat this process for each wk+j , j = 1,... , n − k. We generate a total of n − k vectors {wI k+1,... , w nI} in this manner. This set must be linearly independent. (Why?) Therefore, the dimension of the null space must be at least n − k. Now we consider a new basis which consists of the original vectors and the n − k vectors {w kI+1, wI k+2,... , wI n} for which Aw = 0. We assert that the dimension of the null space is exactly n − k. For if z ∈ Rn is a vector for which Az = 0, then z can be uniquely written as a component z 1 from S(T ) and a component z 2 from S({wI k+1,... , wI n}). But Az 1 W= 0 and Az 2 = 0. Therefore Az = 0 is impossible unless the component z 1 = 0. Conversely, if (f) holds we take a basis for the null space T = {u 1 , u 2 ,... , un−k} and extend the basis

T I^ = T ∪ {un−k+1,... , un}

to Rn. Next argue similarly to above that

Aun−k+1, Aun−k+2,... , Aun

must be linearly independent, for otherwise there is yet another linearly independent vector that can be added to its basis, a contradiction. Therefore the column space must have dimension at least, and hence equal to k.

The following corollary assembles many consequences of this theorem.

52 CHAPTER 2. MATRICES AND LINEAR ALGEBRA

Corollary 2.3.1. (1) r(A) ≤ min(m, n).

(2) r(AB) ≤ min(r(A), r(B)).

(3) r(A + B) ≤ r(A) + r(B).

(4) r(A) = r(AT^ ) = r(A∗) = r( A¯).

(5) If A ∈ Mm(F ) and B ∈ Mm,n(F ), and if A is invertible, then

r(AB) = r(B).

Similarly, if C ∈ Mn(F ) is invertible and B ∈ Mm,n(F )

r(BC) = r(B).

(6) r(A) = r(AT^ A) = r(A∗A).

(7) Let A ∈ Mm,n(F ), with r(A) = k. Then A = XBY where X ∈ Mm,k, Y ∈ Mk,n and B ∈ Mk is invertible.

(8) In particular, every rank 1 matrix has the form A = xyT^ , where x ∈ Rm and y ∈ Rn. Here

xyT^ =

x 1 y 1 x 1 y 2... x 1 yn .. .

xmy 1 xmy 2... xmyn

Proof. (1) The rank of any matrix is the number of linearly independent rows, which is the same as the number of linearly independent columns. The maximum this value can be is therefore the maximum of the minimum of the dimensions of the matrix, or r (A) ≤ min (m, n).

(2) The product AB can be viewed in two ways. The first is as a set of linear combinations of the rows of B, and the other is as a set of linear combinations of the columns of A. In either case the number of linear independent rows (or columns as the case may be) In other words, the rank of the product AB cannot be greater than the number of linearly independent columns of A nor greater than the number of linearly independent rows of B. Another way to express this is as r (AB) ≤ min(r (A) , r (B))