









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The essential concepts of matrices, their operations, and multiplication. It covers the definition of matrices, their arithmetic operations, and the properties of matrix multiplication. The document also provides examples of matrices and their multiplication, as well as the relationship between matrix multiplication and the standard inner product.
Typology: Study notes
1 / 15
This page cannot be seen from the preview
Don't miss anything!










Isaiah Lankham, Bruno Nachtergaele, Anne Schilling (February 4, 2007)
Let m, n ∈ Z+ be positive integers. Then we begin by defining an m × n matrix A to be a rectangular array of numbers
A = (aij )m,ni,j=1 = (A(i,j))m,ni,j=1 =
a 11 · · · a 1 n .. .
am 1 · · · amn
m numbers
n numbers
where each element aij ∈ F in the array is called an entry of A (specifically, aij is called the “i, j entry”), i indexes the rows of A by ranging over the set { 1 ,... , m}, and j indexes the columns of A by ranging over the set { 1 ,... , n}. We say that the matrix A has size m × n and note that it is a (finite) sequence of doubly-subscripted numbers for which the two subscripts in no way depend upon each other. Given the ubiquity of matrices in mathematics thought, a rich vocabulary has been developed for describing various properties and features of matrices that are most useful to their application. In addition, there is also a rich set of equivalent notations. For the purposes of these notes, we will use the above notation unless the size of the matrix is understood from context or is unimportant. In this case, we will drop much of this notation and denote a matrix simply as
A = (aij ) or A = (aij )m×n.
To get a sense of the essential vocabulary, suppose that we have an m×n matrix A = (aij ) with m = n. Then we call A a square matrix. The elements a 11 , a 22 ,... , ann in a square matrix form what is called the main diagonal of A, and the elements a 1 n, a 2 ,n− 1 ,... , an 1 form what is sometimes called the skew main diagonal of A. Entries not on the main diagonal are also often called off-diagonal entries, and a matrix whose off-diagonal entries are all zero is called a diagonal matrix. It is common to call the elements a 12 , a 23 ,... , an− 1 ,n the superdiagonal of A and a 21 , a 32 ,... , an,n− 1 the subdiagonal. The motivation for this
Copyright ©c 2007 by the authors. These lecture notes may be reproduced in their entirety for non- commercial purposes.
terminology should be clear if you create a sample square matrix and trace the entries within these particular subsequences of the matrix. Square matrices are important because they are fundamental to applications of Linear Algebra. In particular, virtually every use of Linear Algebra in problem solving either involves square matrices directly or employ them in some indirect manner. In addition, virtually every usage also involves the notion of vector, where here we mean either an m × 1 matrix (a.k.a. a row vector ) or a 1 × n matrix (a.k.a. a column vector ).
Example 1.1. Suppose that A = (aij ), B = (bij ), C = (cij ), D = (dij ), and E = (eij ) are the following matrices over F:
Then we say that A is a 3 × 1 matrix (a.k.a a column vector), B is a 2 × 2 square matrix, C is a 1 × 3 matrix (a.k.a. a row vector), and both D and E are square 3 × 3 matrices.
We can discuss individual entries in each matrix. E.g., d 12 = 5 and e 12 = e 22 = e 32 = 1. The diagonal of D is the sequence d 11 = 1, d 22 = 0, d 33 = 4. The subdiagonal of E is the sequence e 21 = − 1 , e 32 = 1.
We also note that B is called an upper-triangular matrix since all of the elements “below” the main diagonal are zero. However, none of the matrices above are diagonal matrices. Given any positive integer n ∈ Z+, we can construct the diagonal matrices In (called the identity matrix ) and 0n×n (called the zero matrix ) by setting
In =
and (^0) n×n =
where each of these matrices is understood to be a square matrix of size n × n. The zero matrix 0m×n is analogously defined for any two positive integer m, n ∈ Z+ and has size m×n.
Given positive integers m, n ∈ Z+, we use Fm×n^ to denote the set of all m×n matrices having entries over F. In this section, we examine algebraic properties of this set. Specifically, Fm×n^ forms a vector space under the operations of component-wise addition and scalar multiplication, and it is isomorphic to Fmn^ as a vector space.
by analogy to the standard basis for Fmn. That is, each Ekℓ = ((e(k,ℓ))ij ) satisfies
(e(k,ℓ))ij =
1 , if i = k and j = ℓ 0 , otherwise
This allows us to build a vector space isomorphism Fm×n^ → Fmn^ using a bijection that simply “lays each matrix out flat”. In other words, given A = (aij ) ∈ Fm×n,
a 11 · · · a 1 n ..
....
am 1 · · · amn
7 →^ (a 11 , a 12 ,... , a 1 n, a 21 , a 22 ,... , a 2 n,... , am 1 , am 2 ,... , amn)^ ∈^ Fmn.
Example 2.2. The vector space R^2 ×^3 of 2 × 3 matrices over R has standard basis { E 11 =
which is seen to naturally correspond with the standard basis {e 1 ,... , e 6 } for R 6 , where
e 1 = (1, 0 , 0 , 0 , 0 , 0), e 2 = (0, 1 , 0 , 0 , 0 , 0),... , e 6 = (0, 0 , 0 , 0 , 0 , 1).
Of course, it is not enough to just assert that Fm×n^ is a vector space since we have yet to verify that the above defined operations of addition and scalar multiplication satisfy the axioms of a vector space. The proof of the following theorem is straight-forward and something that you should work through for practice with matrix notation.
Theorem 2.3. Given positive integers m, n ∈ Z+ and the operations of matrix addition and scalar multiplication as defined above, the set Fm×n^ of all m × n matrices satisfies each of the following properties.
(A + B) + C = A + (B + C).
A + 0m×n = 0m×n + A = A.
A + (−A) = (−A) + A = 0m×n.
A + B = B + A.
1 A = A.
(α + β)A = αA + βA and α(A + B) = αA + αB.
As a consequence of Theorem 2.3, every property that holds for an arbitrary vector space can be taken as a property of Fm×n^ specifically. Some of the more basic such properties are collected as follows.
Corollary 2.4. Given positive integers m, n ∈ Z+ and the operations of matrix addition and scalar multiplication as defined above, the set Fm×n^ of all m × n matrices satisfies each of the following properties.
αA = 0 =⇒ either α = 0 or A = 0m×n.
−(αA) = (−α)A = α(−A).
In particular, the additive inverse −A of A is given by −A = (−1)A, where 1 denoted the additivity identity for F.
While one could prove Corollary 2.4 directly from definitions, the point of recognizing Fm×n as a vector space is that you get to use these results without worrying about their proof. Moreover, there is no need to separately prove that they hold for both Rm×n^ and Cm×n.
Example 2.5. With notation as in Example 1.1 above, you should sit down and use the above definitions in order to verify that the following matrix products hold.
∈ F^1 ×^3 , and
Note, though, that B cannot be multiplied by any of the other matrices, nor does it make sense to try to form the products AD, AE, DC, and EC due to the inherent size mismatches.
As illustrated in Example 2.5 above, matrix multiplication is not a commutative operation (since, e.g., AC ∈ F^3 ×^3 while CA ∈ F^1 ×^1 ). Nonetheless, despite the complexity of its definition, the matrix product otherwise satisfies many familiar properties of a multiplication operation. We summarize the most basic of these properties in the following theorem.
Theorem 2.6. Let r, s, t, u ∈ Z+ be positive integers.
A(BC) = (AB)C.
A(B + C) = AB + AC and (B + C)D = BD + CD.
α(AB) = (αA)B = A(αB).
As with Theorem 2.3, you should work through a proof of each part of Theorem 2.6 (and especially of the first part) in order to practice manipulating the indices of entries correctly.
At the same time, you should be careful to not blithely perform operations on matrices as you would with numbers. The fact that matrix multiplication is not a commutative operation should make it clear that significantly more care is required with matrix arithmetic. As another example, given a positive integers n ∈ Z+, the set Fn×n^ has what are called zero divisors. That is, there exist non-zero matrices A, B ∈ Fn×n^ such that AB = 0n×n:
[ 0 1 0 0
Moreover, note that there exist matrices A, B, C ∈ Fn×n^ such that AB = AC but B 6 = C:
[ 0 1 0 0
As a result, we say that the set Fn×n^ fails to have the so-called cancellation property. This failure is a direct result of the fact that there are non-zero matrices in Fn×n^ that have no multiplicative inverse. We discuss matrix invertibility at length in the next section and define a special subset GL(n, F) ⊂ Fn×n^ upon which the cancellation property does hold.
Given a positive integer n ∈ Z+, the set of square matrices Fn×n^ admits a natural multipli- cation operation between any two elements as defined in the section above, and so we can also discuss invertibility of elements in Fn×n^ with respect to this multiplication operation. If you have taken a more introductory course in Linear Algebra, then you have most likely seen much of this material in one form or another. Criteria for determining when a matrix can be inverted and uses of invertibility typically dominate the bulk of such courses. We also define three important operations on square matrices called the transpose, conju- gate transpose, and the trace. These will then be seen to interact with matrix multiplication and invertibility to form special classes of matrices that are extremely important to applica- tions of Linear Algebra.
Given a positive integer n ∈ Z+, we say that a square matrices A ∈ Fn×n^ is invertible (a.k.a. nonsingular ) if there exists a square matrix B ∈ Fn×n^ such that
AB = BA = In.
Moreover, we denote by GL(n, F) the set of all invertible n × n matrices over F. Note, in particular, that the zero matrix 0n×n ∈/ GL(n, F) so that GL(n, F) is not a subspace of Fn×n. One can prove that if the multiplicative inverse of a matrix A exists, then it is unique, and so we usually denote it by A−^1. This notation for matrix inverse is by analogy to the notation
Definition 3.3. Let G be a set and ∗ be a binary operation of G. (In other words, ∗ is the name of a function ∗ : G × G → G, where the notation a ∗ b means the same thing as ∗(a, b).) Then G is said to form a group under ∗ if the following three conditions are satisfied:
You should recognize these as three of the four axioms that addition in a vector space must satisfy. In particular, a vector space is said to be an abelian group under the binary operation of vector addition, where abelian is defined as follows.
Definition 3.4. Let G be a group under binary operation ∗. Then G is called an abelian (a.k.a. commutative) group if, given any two elements a, b ∈ G, a ∗ b = b ∗ a.
Note in particular that GL(n, F) ⊂ Fn×n^ and that GL(n, F) is a non-abelian group under matrix multiplication when n ≥ 2. At the same time, Fn×n^ is an abelian group under vector addition (since it is a vector space as shown in Theorem 2.3) but does not form a group under matrix multiplication. The set GL(n, F), similarly, does not form a group under vector addition since, e.g., the additive identity element 0n×n ∈/ GL(n, F). This illustrates the importance of emphasizing the operation under which we which to consider whether or not a set forms a mathematical structure such as a group. The notion of a group is fundamental to mathematical thought because it is often used as a building block for more complicated algebraic structures, as in a vector space being an abelian group together with a “compatibly defined” scalar multiplication. As another example, we note that the set of real numbers R is an abelian group under addition. Moreover, the set of non-zero real numbers R \ { 0 } can also be seen to form a group under multiplication. Thus, since R is an abelian group under these two “compatibly defined” notions of addition and multiplication, we call R a field. The set of complex numbers C is also seen to form a field, and there are many other important examples of fields including the set Q of rational numbers. Note, though, that a set like Z is not a field since Z \ { 0 } does not form a group under multiplication. More precisely, Z fails to completely satisfy the following definition.
Definition 3.5. Let G be a set with binary operations + and ∗ defined on G. Then we say that G forms a field under + and ∗ if the following three conditions are satisfied:
Given positive integers m, n ∈ Z+ and any matrix A = (aij ) ∈ Fm×n, we define the transpose AT^ = ((aT^ )ij ) ∈ Fn×m^ and the conjugate transpose A∗^ = ((a∗)ij ) ∈ Fn×m^ by
(aT^ )ij = aji and (a∗)ij = aji ,
where aji denotes the complex conjugate of the scalar aji ∈ F. In particular, if A ∈ Rm×n, then note that AT^ = A∗.
Example 3.6. With notation as in Example 1.1 above,
One of the motivations for defining the operations of transpose and conjugate transpose is that they interact with the usual arithmetic operations on matrices in a natural manner. We summarizes the most fundamental of these interactions in the following theorem.
Theorem 3.7. Given positive integers m, n ∈ Z+ and any matrices A, B ∈ Fm×n,
(AT^ )−^1 = (A−^1 )T^ and (A∗)−^1 = (A−^1 )∗.
Another motivation for defining the transpose and conjugate transpose operations is that they allow us to define several very special classes of matrices.
Definition 3.8. Given a positive integer n ∈ Z+, we say that the square matrix A ∈ Fn×n
As discussed in the introductory document, What Is Linear Algebra?, Linear Algebra can be seen as a branch of mathematics aimed at solving systems of linear equations involving a finite number of unknowns. As we will see below, matrices constitute one of the most fundamental tools for solving such systems. In particular, any arbitrary number of equations in any number of unknowns — as long as both are finite — can be encoded as a single matrix equation, and it is through such an encoding that computers are used on literally a daily basis to solve countless real world problems. In order to encode a system of linear equations as a matrix equation, we will first exploit the definition of matrix multiplication. Then we will see that this encoding directly relates to the notion of linear map between vector spaces.
Let m, n ∈ Z+ be positive integers, and consider the system of m linear equations in n unknowns x 1 ,... , xn, ∑n
k=
a 1 ,kxk = b 1
∑^ n
k=
am,kxk = bm,
where each ai,j , bi ∈ F. We use A = (aij ) to denote the m × n coefficient matrix associated to this linear system and x = (xi) to denote the n × 1 column vector composed of the unknowns x 1 ,... , xn. In other words,
a 11 a 12 · · · a 1 n a 21 a 22 · · · a 2 n .. .
am 1 am 2 · · · amn
and x =
x 1 x 2 .. . xn
Using this notion, it should be clear from the definition of matrix multiplication that we have the equality
Ax =
a 11 x 1 + a 12 x 2 + · · · + a 1 nxn a 21 x 1 + a 22 x 2 + · · · + a 2 nxn .. . am 1 x 1 + am 2 x 2 + · · · + amnxn
b 1 .. . bm
=^ b,
where we have used b to denote the m × 1 column vector formed from the right-hand side of the linear system.
As this course progresses, you see that encoding a linear system as a matrix equation is more than just a notational trick. Perhaps most fundamentally, this matrix point of view provides unparalleled insight into when a solution to a linear system is the only possible solution. (In general, the more than can be said with absolute certainty when solving a problem, the better.) We illustrate this in the following example.
Example 4.1. Suppose that we have the system of linear equations
2 x 1 + x 2 = 0 x 1 − x 2 = 1,
where x 1 and x 2 are unknown real numbers. To solve this system, you have most likely been taught to first solve for one of the unknowns in one of the equations and then to substitute the result into the other equation. Here, for example, one might solve to obtain
x 1 = 1 + x 2
from the second equation. Then, substituting this in place of x 1 in the first equation, one obtains 2(1 + x 2 ) + x 2 = 0.
From this, we find that x 2 = − 2 /3. Then, by further substitution,
x 1 = 1 +
Finally, if we wish to verify that this is the only solution to the given linear system, then we might appeal to a graph. In other words, since each of the equations corresponds to a line in the Euclidean plane R^2 , we can see that the solution (x 1 , x 2 ) = (1/ 3 , − 2 /3) corresponds exactly to the single point of intersection between these two lines:
x
y
y = x − 1
y = − 2 x
The above analysis, while tedious, is nonetheless straightforward. However, similar cal- culations can quickly become unwieldy when attempted on three or more equations, and