Matrix Operations and Multiplication: Properties and Applications - Prof. Anne Schilling, Study notes of Linear Algebra

The essential concepts of matrices, their operations, and multiplication. It covers the definition of matrices, their arithmetic operations, and the properties of matrix multiplication. The document also provides examples of matrices and their multiplication, as well as the relationship between matrix multiplication and the standard inner product.

Typology: Study notes

Pre 2010

Uploaded on 07/30/2009

koofers-user-abv
koofers-user-abv 🇺🇸

5

(2)

10 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
MAT067 University of California, Davis Winter 2007
Notes on Matrices and Matrix Operations
Isaiah Lankham, Bruno Nachtergaele, Anne Schilling
(February 4, 2007)
1 Definition of and Notation for Matrices
Let m, n Z+be positive integers. Then we begin by defining an m×nmatrix Ato be a
rectangular array of numbers
A= (aij)m,n
i,j=1 = (A(i,j ))m,n
i,j=1 =
a11 ··· a1n
.
.
.....
.
.
am1··· amn
mnumbers
|{z }
nnumbers
where each element aij Fin the array is called an entry of A(specifically, aij is called
the i, j entry”), iindexes the rows of Aby ranging over the set {1,...,m}, and jindexes
the columns of Aby ranging over the set {1,...,n}. We say that the matrix Ahas size
m×nand note that it is a (finite) sequence of doubly-subscripted numbers for which the
two subscripts in no way depend upon each other.
Given the ubiquity of matrices in mathematics thought, a rich vocabulary has been
developed for describing various properties and features of matrices that are most useful
to their application. In addition, there is also a rich set of equivalent notations. For the
purposes of these notes, we will use the above notation unless the size of the matrix is
understood from context or is unimportant. In this case, we will drop much of this notation
and denote a matrix simply as
A= (aij) or A= (aij)m×n.
To get a sense of the essential vocabulary, suppose that we have an m×nmatrix A= (aij)
with m=n. Then we call Aasquare matrix. The elements a11, a22 ,...,ann in a square
matrix form what is called the main diagonal of A, and the elements a1n, a2,n1,...,an1
form what is sometimes called the skew main diagonal of A. Entries not on the main
diagonal are also often called off-diagonal entries, and a matrix whose off-diagonal entries
are all zero is called a diagonal matrix. It is common to call the elements a12, a23,...,an1,n
the superdiagonal of Aand a21, a32 ,...,an,n1the subdiagonal. The motivation for this
Copyright c
2007 by the authors. These lecture notes may be reproduced in their entirety for non-
commercial purposes.
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Matrix Operations and Multiplication: Properties and Applications - Prof. Anne Schilling and more Study notes Linear Algebra in PDF only on Docsity!

MAT067 University of California, Davis Winter 2007

Notes on Matrices and Matrix Operations

Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 4, 2007)

1 Definition of and Notation for Matrices

Let m, n ∈ Z+ be positive integers. Then we begin by defining an m × n matrix A to be a rectangular array of numbers

A = (aij )m,ni,j=1 = (A(i,j))m,ni,j=1 =

a 11 · · · a 1 n .. .

am 1 · · · amn

m numbers

n numbers

where each element aij ∈ F in the array is called an entry of A (specifically, aij is called the “i, j entry”), i indexes the rows of A by ranging over the set { 1 ,... , m}, and j indexes the columns of A by ranging over the set { 1 ,... , n}. We say that the matrix A has size m × n and note that it is a (finite) sequence of doubly-subscripted numbers for which the two subscripts in no way depend upon each other. Given the ubiquity of matrices in mathematics thought, a rich vocabulary has been developed for describing various properties and features of matrices that are most useful to their application. In addition, there is also a rich set of equivalent notations. For the purposes of these notes, we will use the above notation unless the size of the matrix is understood from context or is unimportant. In this case, we will drop much of this notation and denote a matrix simply as

A = (aij ) or A = (aij )m×n.

To get a sense of the essential vocabulary, suppose that we have an m×n matrix A = (aij ) with m = n. Then we call A a square matrix. The elements a 11 , a 22 ,... , ann in a square matrix form what is called the main diagonal of A, and the elements a 1 n, a 2 ,n− 1 ,... , an 1 form what is sometimes called the skew main diagonal of A. Entries not on the main diagonal are also often called off-diagonal entries, and a matrix whose off-diagonal entries are all zero is called a diagonal matrix. It is common to call the elements a 12 , a 23 ,... , an− 1 ,n the superdiagonal of A and a 21 , a 32 ,... , an,n− 1 the subdiagonal. The motivation for this

Copyright ©c 2007 by the authors. These lecture notes may be reproduced in their entirety for non- commercial purposes.

terminology should be clear if you create a sample square matrix and trace the entries within these particular subsequences of the matrix. Square matrices are important because they are fundamental to applications of Linear Algebra. In particular, virtually every use of Linear Algebra in problem solving either involves square matrices directly or employ them in some indirect manner. In addition, virtually every usage also involves the notion of vector, where here we mean either an m × 1 matrix (a.k.a. a row vector ) or a 1 × n matrix (a.k.a. a column vector ).

Example 1.1. Suppose that A = (aij ), B = (bij ), C = (cij ), D = (dij ), and E = (eij ) are the following matrices over F:

A =

, B =

[

]

, C =

[

]

, D =

, E =

Then we say that A is a 3 × 1 matrix (a.k.a a column vector), B is a 2 × 2 square matrix, C is a 1 × 3 matrix (a.k.a. a row vector), and both D and E are square 3 × 3 matrices.

We can discuss individual entries in each matrix. E.g., d 12 = 5 and e 12 = e 22 = e 32 = 1. The diagonal of D is the sequence d 11 = 1, d 22 = 0, d 33 = 4. The subdiagonal of E is the sequence e 21 = − 1 , e 32 = 1.

We also note that B is called an upper-triangular matrix since all of the elements “below” the main diagonal are zero. However, none of the matrices above are diagonal matrices. Given any positive integer n ∈ Z+, we can construct the diagonal matrices In (called the identity matrix ) and 0n×n (called the zero matrix ) by setting

In =

and (^0) n×n =

where each of these matrices is understood to be a square matrix of size n × n. The zero matrix 0m×n is analogously defined for any two positive integer m, n ∈ Z+ and has size m×n.

2 Matrix Arithmetic

Given positive integers m, n ∈ Z+, we use Fm×n^ to denote the set of all m×n matrices having entries over F. In this section, we examine algebraic properties of this set. Specifically, Fm×n^ forms a vector space under the operations of component-wise addition and scalar multiplication, and it is isomorphic to Fmn^ as a vector space.

by analogy to the standard basis for Fmn. That is, each Ekℓ = ((e(k,ℓ))ij ) satisfies

(e(k,ℓ))ij =

1 , if i = k and j = ℓ 0 , otherwise

This allows us to build a vector space isomorphism Fm×n^ → Fmn^ using a bijection that simply “lays each matrix out flat”. In other words, given A = (aij ) ∈ Fm×n,

  

a 11 · · · a 1 n ..

....

am 1 · · · amn

 7 →^ (a 11 , a 12 ,... , a 1 n, a 21 , a 22 ,... , a 2 n,... , am 1 , am 2 ,... , amn)^ ∈^ Fmn.

Example 2.2. The vector space R^2 ×^3 of 2 × 3 matrices over R has standard basis { E 11 =

[

]

, E 12 =

[

]

, E 13 =

[

]

E 21 =

[

]

, E 22 =

[

]

, E 23 =

[

]}

which is seen to naturally correspond with the standard basis {e 1 ,... , e 6 } for R 6 , where

e 1 = (1, 0 , 0 , 0 , 0 , 0), e 2 = (0, 1 , 0 , 0 , 0 , 0),... , e 6 = (0, 0 , 0 , 0 , 0 , 1).

Of course, it is not enough to just assert that Fm×n^ is a vector space since we have yet to verify that the above defined operations of addition and scalar multiplication satisfy the axioms of a vector space. The proof of the following theorem is straight-forward and something that you should work through for practice with matrix notation.

Theorem 2.3. Given positive integers m, n ∈ Z+ and the operations of matrix addition and scalar multiplication as defined above, the set Fm×n^ of all m × n matrices satisfies each of the following properties.

  1. (associativity of matrix addition) Given any three matrices A, B, C ∈ Fm×n,

(A + B) + C = A + (B + C).

  1. (additive identity for matrix addition) Given any matrices A ∈ Fm×n,

A + 0m×n = 0m×n + A = A.

  1. (additive inverses for matrix addition) Given any matrices A ∈ Fm×n, there exists a matrix −A ∈ Fm×n^ such that

A + (−A) = (−A) + A = 0m×n.

  1. (commutativity of matrix addition) Given any two matrices A, B ∈ Fm×n,

A + B = B + A.

  1. (associativity of scalar multiplication) Given any matrix A ∈ Fm×n^ and any two scalars α, β ∈ F, (αβ)A = α(βA).
  2. (multiplicative identity for scalar multiplication) Given any matrix A ∈ Fm×n^ and denoting by 1 the multiplicative identity of F,

1 A = A.

  1. (distributivity of scalar multiplication) Given any two matrices A, B ∈ Fm×n^ and any two scalars α, β ∈ F,

(α + β)A = αA + βA and α(A + B) = αA + αB.

As a consequence of Theorem 2.3, every property that holds for an arbitrary vector space can be taken as a property of Fm×n^ specifically. Some of the more basic such properties are collected as follows.

Corollary 2.4. Given positive integers m, n ∈ Z+ and the operations of matrix addition and scalar multiplication as defined above, the set Fm×n^ of all m × n matrices satisfies each of the following properties.

  1. Given any matrix A ∈ Fm×n, given any scalar α ∈ F, and denoting by 0 the additive identity of F, 0 A = A and α (^0) m×n = 0m×n.
  2. Given any matrix A ∈ Fm×n^ and any scalar α ∈ F,

αA = 0 =⇒ either α = 0 or A = 0m×n.

  1. Given any matrix A ∈ Fm×n^ and any scalar α ∈ F,

−(αA) = (−α)A = α(−A).

In particular, the additive inverse −A of A is given by −A = (−1)A, where 1 denoted the additivity identity for F.

While one could prove Corollary 2.4 directly from definitions, the point of recognizing Fm×n as a vector space is that you get to use these results without worrying about their proof. Moreover, there is no need to separately prove that they hold for both Rm×n^ and Cm×n.

Example 2.5. With notation as in Example 1.1 above, you should sit down and use the above definitions in order to verify that the following matrix products hold.

AC =

[

]

 ∈ F^3 ×^3 ,

CA =

[

]

 = 3 − 4 + 2 = 1 ∈ F,

B^2 = BB =

[

] [

]

[

]

∈ F^2 ×^2 ,

CE =

[

]

[

]

∈ F^1 ×^3 , and

DA =

 ∈ F^3 ×^1.

Note, though, that B cannot be multiplied by any of the other matrices, nor does it make sense to try to form the products AD, AE, DC, and EC due to the inherent size mismatches.

As illustrated in Example 2.5 above, matrix multiplication is not a commutative operation (since, e.g., AC ∈ F^3 ×^3 while CA ∈ F^1 ×^1 ). Nonetheless, despite the complexity of its definition, the matrix product otherwise satisfies many familiar properties of a multiplication operation. We summarize the most basic of these properties in the following theorem.

Theorem 2.6. Let r, s, t, u ∈ Z+ be positive integers.

  1. (associativity of matrix multiplication) Given A ∈ Fr×s, B ∈ Fs×t, and C ∈ Ft×u,

A(BC) = (AB)C.

  1. (distributivity of matrix multiplication) Given A ∈ Fr×s, B, C ∈ Fs×t, and D ∈ Ft×u,

A(B + C) = AB + AC and (B + C)D = BD + CD.

  1. (compatibility with scalar multiplication) Given A ∈ Fr×s, B ∈ Fs×t, and α ∈ F,

α(AB) = (αA)B = A(αB).

As with Theorem 2.3, you should work through a proof of each part of Theorem 2.6 (and especially of the first part) in order to practice manipulating the indices of entries correctly.

At the same time, you should be careful to not blithely perform operations on matrices as you would with numbers. The fact that matrix multiplication is not a commutative operation should make it clear that significantly more care is required with matrix arithmetic. As another example, given a positive integers n ∈ Z+, the set Fn×n^ has what are called zero divisors. That is, there exist non-zero matrices A, B ∈ Fn×n^ such that AB = 0n×n:

[ 0 1 0 0

] 2

[

] [

]

[

]

= 0 2 × 2.

Moreover, note that there exist matrices A, B, C ∈ Fn×n^ such that AB = AC but B 6 = C:

[ 0 1 0 0

] [

]

= 0 2 × 2 =

[

] [

]

As a result, we say that the set Fn×n^ fails to have the so-called cancellation property. This failure is a direct result of the fact that there are non-zero matrices in Fn×n^ that have no multiplicative inverse. We discuss matrix invertibility at length in the next section and define a special subset GL(n, F) ⊂ Fn×n^ upon which the cancellation property does hold.

3 Some Special Operations on Square Matrices

Given a positive integer n ∈ Z+, the set of square matrices Fn×n^ admits a natural multipli- cation operation between any two elements as defined in the section above, and so we can also discuss invertibility of elements in Fn×n^ with respect to this multiplication operation. If you have taken a more introductory course in Linear Algebra, then you have most likely seen much of this material in one form or another. Criteria for determining when a matrix can be inverted and uses of invertibility typically dominate the bulk of such courses. We also define three important operations on square matrices called the transpose, conju- gate transpose, and the trace. These will then be seen to interact with matrix multiplication and invertibility to form special classes of matrices that are extremely important to applica- tions of Linear Algebra.

3.1 Invertibility of Square Matrices

Given a positive integer n ∈ Z+, we say that a square matrices A ∈ Fn×n^ is invertible (a.k.a. nonsingular ) if there exists a square matrix B ∈ Fn×n^ such that

AB = BA = In.

Moreover, we denote by GL(n, F) the set of all invertible n × n matrices over F. Note, in particular, that the zero matrix 0n×n ∈/ GL(n, F) so that GL(n, F) is not a subspace of Fn×n. One can prove that if the multiplicative inverse of a matrix A exists, then it is unique, and so we usually denote it by A−^1. This notation for matrix inverse is by analogy to the notation

Definition 3.3. Let G be a set and ∗ be a binary operation of G. (In other words, ∗ is the name of a function ∗ : G × G → G, where the notation a ∗ b means the same thing as ∗(a, b).) Then G is said to form a group under ∗ if the following three conditions are satisfied:

  1. (associativity) Given any three elements a, b, c ∈ G, (a ∗ b) ∗ c = a ∗ (b ∗ c).
  2. (existence of an identity element) There is an element e ∈ G such that, given any element a ∈ G, a ∗ e = e ∗ a = a.
  3. (existence of inverse elements) Given any element a ∈ G, there is an element b ∈ G such that a ∗ b = b ∗ a = e.

You should recognize these as three of the four axioms that addition in a vector space must satisfy. In particular, a vector space is said to be an abelian group under the binary operation of vector addition, where abelian is defined as follows.

Definition 3.4. Let G be a group under binary operation ∗. Then G is called an abelian (a.k.a. commutative) group if, given any two elements a, b ∈ G, a ∗ b = b ∗ a.

Note in particular that GL(n, F) ⊂ Fn×n^ and that GL(n, F) is a non-abelian group under matrix multiplication when n ≥ 2. At the same time, Fn×n^ is an abelian group under vector addition (since it is a vector space as shown in Theorem 2.3) but does not form a group under matrix multiplication. The set GL(n, F), similarly, does not form a group under vector addition since, e.g., the additive identity element 0n×n ∈/ GL(n, F). This illustrates the importance of emphasizing the operation under which we which to consider whether or not a set forms a mathematical structure such as a group. The notion of a group is fundamental to mathematical thought because it is often used as a building block for more complicated algebraic structures, as in a vector space being an abelian group together with a “compatibly defined” scalar multiplication. As another example, we note that the set of real numbers R is an abelian group under addition. Moreover, the set of non-zero real numbers R \ { 0 } can also be seen to form a group under multiplication. Thus, since R is an abelian group under these two “compatibly defined” notions of addition and multiplication, we call R a field. The set of complex numbers C is also seen to form a field, and there are many other important examples of fields including the set Q of rational numbers. Note, though, that a set like Z is not a field since Z \ { 0 } does not form a group under multiplication. More precisely, Z fails to completely satisfy the following definition.

Definition 3.5. Let G be a set with binary operations + and ∗ defined on G. Then we say that G forms a field under + and ∗ if the following three conditions are satisfied:

  1. The set G forms an abelian group under +.
  2. The set G \ { 0 } forms an abelian group under ∗, where 0 denotes the identity element of G as a group under +.
  3. Given any three elements a, b, c ∈ G, a ∗ (b + c) = a ∗ b + a ∗ c.

3.2 Transpose and Conjugate Transpose

Given positive integers m, n ∈ Z+ and any matrix A = (aij ) ∈ Fm×n, we define the transpose AT^ = ((aT^ )ij ) ∈ Fn×m^ and the conjugate transpose A∗^ = ((a∗)ij ) ∈ Fn×m^ by

(aT^ )ij = aji and (a∗)ij = aji ,

where aji denotes the complex conjugate of the scalar aji ∈ F. In particular, if A ∈ Rm×n, then note that AT^ = A∗.

Example 3.6. With notation as in Example 1.1 above,

AT^ =

[

]

, BT^ =

[

]

, CT^ =

, DT^ =

, ET^ =

One of the motivations for defining the operations of transpose and conjugate transpose is that they interact with the usual arithmetic operations on matrices in a natural manner. We summarizes the most fundamental of these interactions in the following theorem.

Theorem 3.7. Given positive integers m, n ∈ Z+ and any matrices A, B ∈ Fm×n,

  1. (AT^ )T^ = A and (A∗)∗^ = A.
  2. (A + B)T^ = AT^ + BT^ and (A + B)∗^ = A∗^ + B∗.
  3. (αA)T^ = αAT^ and (αA)∗^ = αA∗, where α ∈ F is any scalar.
  4. (AB)T^ = BT^ AT^.
  5. if m = n and A ∈ GL(n), then AT^ , A∗^ ∈ GL(n) with respective inverses given by

(AT^ )−^1 = (A−^1 )T^ and (A∗)−^1 = (A−^1 )∗.

Another motivation for defining the transpose and conjugate transpose operations is that they allow us to define several very special classes of matrices.

Definition 3.8. Given a positive integer n ∈ Z+, we say that the square matrix A ∈ Fn×n

  1. is symmetric if A = AT^.
  2. is Hermitian if A = A∗.
  3. is orthogonal if A ∈ GL(n, R) and A−^1 = AT^. Moreover, we define the (real) orthogonal group to be the set O(n) = {A ∈ GL(n, R) | A−^1 = AT^ }.
  4. is unitary if A ∈ GL(n, C) and A−^1 = A∗. Moreover, we define the (complex) unitary group to be the set U(n) = {A ∈ GL(n, C) | A−^1 = A∗}.

4 Matrices in Context

As discussed in the introductory document, What Is Linear Algebra?, Linear Algebra can be seen as a branch of mathematics aimed at solving systems of linear equations involving a finite number of unknowns. As we will see below, matrices constitute one of the most fundamental tools for solving such systems. In particular, any arbitrary number of equations in any number of unknowns — as long as both are finite — can be encoded as a single matrix equation, and it is through such an encoding that computers are used on literally a daily basis to solve countless real world problems. In order to encode a system of linear equations as a matrix equation, we will first exploit the definition of matrix multiplication. Then we will see that this encoding directly relates to the notion of linear map between vector spaces.

4.1 Encoding Linear Systems as a Matrix Equation

Let m, n ∈ Z+ be positive integers, and consider the system of m linear equations in n unknowns x 1 ,... , xn, ∑n

k=

a 1 ,kxk = b 1

∑^ n

k=

am,kxk = bm,

where each ai,j , bi ∈ F. We use A = (aij ) to denote the m × n coefficient matrix associated to this linear system and x = (xi) to denote the n × 1 column vector composed of the unknowns x 1 ,... , xn. In other words,

A =

a 11 a 12 · · · a 1 n a 21 a 22 · · · a 2 n .. .

am 1 am 2 · · · amn

and x =

x 1 x 2 .. . xn

Using this notion, it should be clear from the definition of matrix multiplication that we have the equality

Ax =

a 11 x 1 + a 12 x 2 + · · · + a 1 nxn a 21 x 1 + a 22 x 2 + · · · + a 2 nxn .. . am 1 x 1 + am 2 x 2 + · · · + amnxn

b 1 .. . bm

 =^ b,

where we have used b to denote the m × 1 column vector formed from the right-hand side of the linear system.

4.2 Using Matrices to Define Linear Maps

As this course progresses, you see that encoding a linear system as a matrix equation is more than just a notational trick. Perhaps most fundamentally, this matrix point of view provides unparalleled insight into when a solution to a linear system is the only possible solution. (In general, the more than can be said with absolute certainty when solving a problem, the better.) We illustrate this in the following example.

Example 4.1. Suppose that we have the system of linear equations

2 x 1 + x 2 = 0 x 1 − x 2 = 1,

where x 1 and x 2 are unknown real numbers. To solve this system, you have most likely been taught to first solve for one of the unknowns in one of the equations and then to substitute the result into the other equation. Here, for example, one might solve to obtain

x 1 = 1 + x 2

from the second equation. Then, substituting this in place of x 1 in the first equation, one obtains 2(1 + x 2 ) + x 2 = 0.

From this, we find that x 2 = − 2 /3. Then, by further substitution,

x 1 = 1 +

Finally, if we wish to verify that this is the only solution to the given linear system, then we might appeal to a graph. In other words, since each of the equations corresponds to a line in the Euclidean plane R^2 , we can see that the solution (x 1 , x 2 ) = (1/ 3 , − 2 /3) corresponds exactly to the single point of intersection between these two lines:

x

y

y = x − 1

y = − 2 x

The above analysis, while tedious, is nonetheless straightforward. However, similar cal- culations can quickly become unwieldy when attempted on three or more equations, and