























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Linear Algebra topics for Advanced Engineering Mathematics
Typology: Study notes
1 / 31
This page cannot be seen from the preview
Don't miss anything!
























Revision, vectors and matrices, vector spaces, subspaces, linear independence and dependence, bases and dimension, rank of a matrix, linear transformations and their matrix representations, rank and nullity, change of basis
There is no specific essential reading for this chapter. It is essential that you do some reading, but the topics discussed in this chapter are adequately covered in so many texts on ‘linear algebra’ that it would be artificial and unnecessarily limiting to specify precise passages from precise texts. The list below gives examples of relevant reading. (For full publication details, see Chapter 1.)
Ostaszewski, A. Advanced Mathematical Methods. Chapter 1, sections 1.1–1.7; Chapter 3, sections 3.1–3.2 and 3.4.
Leon, S.J., Linear Algebra with Applications. Chapter 1, sections 1.1–1.3; Chapter 2, section 2.1; Chapter 3, sections 3.1–3.6; Chapter 4, sections 4.1–4.2.
Simon, C.P. and Blume, L., Mathematics for Economists. Chapter 7, sections 7.1–7.4; Chapter 9, section 9.1; Chapter 11, and Chapter 27.
Anthony, M. and Biggs, N. Mathematics for Economics and Finance. Chapters 15–17. (for revision)
In this chapter we first very briefly revise some basics about vectors and matrices, which will be familiar from the ‘Mathematics for economists’ subject. We then explore some new topics, in particular the important theoretical concept of a vector space.
Vectors and matrices
An n-vector v is a list of n numbers, written either as a row-vector
(v 1 , v 2 ,... , vn),
or a column-vector (^)
v 1 v 2 . . . vn
The numbers v 1 , v 2 , and so on are known as the components, entries or coordi- nates of v. The zero vector is the vector with all of its entries equal to 0.
The set Rn^ denotes the set of all vectors of length n, and we usually think of these as column vectors.
We can define addition of two n-vectors by the rule
(w 1 , w 2 ,... , wn) + (v 1 , v 2 ,... , vn) = (w 1 + v 1 , w 2 + v 2 ,... , wn + vn).
(The rule is described here for row vectors but the obvious counterpart holds for column vectors.) Also, we can multiply a vector by any single number α (usually called a scalar in this context) by the following rule:
α(v 1 , v 2 ,... , vn) = (αv 1 , αv 2 ,... , αvn).
For vectors v 1 , v 2 ,... , vk and numbers α 1 , α 2 ,... , αk, the vector α 1 v 1 + · · · + αkvk is known as a linear combination of the vectors v 1 ,... , vk.
A matrix is an array of numbers
a 11 a 12... a 1 n a 21 a 22... a 2 n .. .
am 1 am 2... amn
We denote this array by the single letter A, or by (aij ), and we say that A has m rows and n columns, or that it is an m × n matrix. We also say that A is a matrix of size m × n. If m = n, the matrix is said to be square. The number aij is known as the (i, j)th entry of A. The row vector (ai 1 , ai 2 ,... , ain) is row i of A, or the ith row of A, and the column vector (^)
a 1 j a 2 j .. . anj
is column j of A, or the jth column of A.
It is useful to think of row and column vectors as matrices. For example, we may think of the row vector (1, 2 , 4) as being equal to the 1 × 3 matrix (1 2 4). (Indeed, the only visible difference is that the vector has commas and the matrix does not, and this is merely a notational difference.)
Activity 2.1 Calculate the determinant of
(You should find that the answer is − 1 .)
Recall from ‘Mathematics for economists’ (for it is not going to be reiterated here in all its gory detail!) that there are three types of elementary row operation one can perform on a matrix. These are:
In ‘Mathematics for economists’, row operations were used to solve systems of linear equations by reducing the augmented matrix of the system to echelon form. (Yes, I am afraid you do have to remember this!) A matrix is an echelon matrix if it is of the following form: (^)
where the ∗ entries denote any numbers. The 1s which have been indicated are called leading ones.
Rank, range, null space, and linear equations
There are several ways of defining the rank of a matrix, and we shall meet some other (more sophisticated) ways later. For the moment, we use the following definition.
Definition 2.1 (Rank of a matrix) The rank, rank(A), of a matrix A is the num- ber of non-zero rows in an echelon matrix obtained from A by elementary row opera- tions.
By a non-zero row, we simply mean one that contains entries other than 0.
Example: Consider the matrix
Reducing this to echelon form using elementary row operations, we have:
This last matrix is in echelon form and has two non-zero rows, so the matrix A has rank 2.
Activity 2.2 Prove that matrix
has rank 2.
If a square matrix of size n × n has rank n then it is invertible.
Generally, if A is an m × n matrix, then the number of non-zero rows in a reduced, echelon, form of A can certainly be no more than the total number of rows, m. Furthermore, since the leading ones must be in different columns, the number of leading ones—and hence non-zero rows—in the echelon form, can be no more than the total number, n, of columns. Thus we have:
Theorem 2.1 For an m × n matrix A, rank(A) ≤ min{m, n}, where min{m, n} denotes the smaller of the two integers m and n.
Recall that to solve a system of linear equations, one forms the augmented matrix and reduces it to echelon form by using elementary row operations.
Example: Consider the system of equations
x 1 + 2x 2 + x 3 = 1 2 x 1 + 2x 2 = 2 3 x 1 + 4x 2 + x 3 = 2.
Using row operations to reduce the augmented matrix to echelon form, we obtain
Using back-substitution to solve for x 1 , x 3 and x 6 in terms of x 2 , x 4 and x 5 we get
x 6 = 5, x 3 = − 14 − 2 x 4 , x 1 = − 28 − 3 x 2 − 4 x 4 − 2 x 5.
The form of these equations tells us that we can assign any values to x 2 , x 4 and x 5 , and then the other variables will be determined. Explicitly, if we give x 2 , x 4 , x 5 the arbitrary values s, t, u, the solution is given by
x 1 = − 28 − 3 s − 4 t − 2 u, x 2 = s, x 3 = − 14 − 2 t, x 4 = t, x 5 = u, x 6 = 5.
Observe that there are infinitely many solutions, because the so-called ‘free un- knowns’ x 2 , x 4 , x 5 can take any values s, t, u.
Generally, we can describe what happens when the echelon form has r < n non- zero rows (0 0... 0 1 ∗ ∗... ∗). If the leading 1 is in the kth column it is the coefficient of the unknown xk. So if the rank is r and the leading 1’s occur in columns c 1 , c 2 ,... , cr then the general solution to the system can be expressed in a form where the unknowns xc 1 , xc 2 ,... , xcr are given in terms of the other n − r unknowns, and those n − r unknowns are free to take any values. In the preceding example, we have n = 6 and r = 3, and the 3 unknowns x 1 , x 3 , x 6 can be expressed in terms of the 6 − 3 = 3 free unknowns x 2 , x 4 , x 5.
In the case r = n, where the number of non-zero rows r in the echelon form is equal to the number of unknowns n, there is only one solution to the system — for, the echelon form has no zero rows, and the leading 1’s move one step to the right as we go down the rows, and in this case there is a unique solution obtained by back-substitution from the echelon form. In fact, this can be thought of as a special case of the more general one discussed above: since r = n there are n − r = 0 free unknowns, and the solution is therefore unique.
We can now summarise our conclusions concerning a general linear system.
Consider the system solved above. We found that the general solution in terms of three free unknowns, or parameters, s, t, u is
x 1 = − 28 − 3 s − 4 t − 2 u, x 2 = s, x 3 = − 14 − 2 t, x 4 = t, x 5 = u, x 6 = 5.
If we write x as a column vector,
x =
x 1 x 2 x 3 x 4 x 5 x 6
then
x =
− 28 − 3 s − 4 t − 2 u s − 14 − 2 t t u 5
− 3 s s 0 0 0 0
− 4 t 0 − 2 t t 0 0
− 2 u 0 0 0 u 0
That is, the general solution is
x = v + su 1 + tu 2 + uu 3 ,
where
v =
, u 1 =
, u 2 =
, u 3 =
Applying the same method generally to a consistent system of rank r with n un- knowns, we can express the general solution of a consistent system Ax = b in the form x = v + s 1 u 1 + s 2 u 2 + · · · + sn−r un−r.
Note that, if we put all the si’s equal to 0, we get a solution x = v, which means that Av = b, so v is a particular solution of the system. Putting s 1 = 1 and the remaining si’s equal to zero, we get a solution x = v + u 1 , which means that A(v + u 1 ) = b. Thus
b = A(v + u 1 ) = Av + Au 1 = b + Au 1.
Comparing the first and last expressions, we see that Au 1 is the zero vector 0. Clearly, the same equation holds for u 2 ,... , un−r. So we have proved the following.
The general solution of Ax = b is the sum of:
It’s clear from what we’ve just seen that the general solution to a consistent linear system involves solutions to the system Ax = 0. This set of solutions is given a special name: the null space or kernel of a matrix A. This null space, denoted N (A), is the set of all solutions x to Ax = 0 , where 0 is the all-zero vector. That is,
Definition 2.2 (Null space) For an m × n matrix A, the null space of A is the subset N (A) = {x ∈ Rn^ : Ax = 0 }
of Rn, where 0 = (0, 0 ,... , 0)T^ is the all- 0 vector of length m.
Activity 2.3 Convince yourself of this last statement, that
Ax = α 1 c 1 + α 2 c 2 + · · · + αncn.
So, R(A), as the set of all such products, is the set of all linear combinations of the columns of A. For this reason R(A) is also called the column space of A. (More on this later in this chapter.)
Example: Suppose that A =
. Then for x = (α 1 , α 2 )T^ ,
Ax =
α 1 α 2
α 1 + 2α 2 −α 1 + 3α 2 2 α 1 + α 2
so
R(A) =
α 1 + 2α 2 −α 1 + 3α 2 2 α 1 + α 2
(^) : α 1 , α 2 ∈ R
This may also be written as
R(A) = {α 1 c 1 + α 2 c 2 : α 1 , α 2 ∈ R} ,
where
c 1 =
(^) , c 2 =
are the columns of A.
Vector spaces
We know that vectors of Rn^ can be added together and that they can be ‘scaled’ by real numbers. That is, for every x, y ∈ Rn^ and every α ∈ R, it makes sense to talk about x + y and αx. Furthermore, these operations of addition and multiplication by a scalar (that is, multiplication by a real number) behave and interact ‘sensibly’, in that, for example, α(x + y) = αx + αy, α(βx) = (αβ)x, x + y = y + x,
and so on.
But it is not only vectors in Rn^ that can be added and multiplied by scalars. There are other sets of objects for which this is possible. Consider the set V of all functions from R to R. Then any two of these functions can be added: given f, g ∈ V we simply define the function f + g by
(f + g)(x) = f (x) + g(x).
Also, for any α ∈ R, the function αf is given by
(αf )(x) = α(f (x)).
These operations of addition and scalar multiplication are sometimes said to be point- wise addition and pointwise scalar multiplication. This might seem a bit ab- stract, but think about what the functions x + x^2 and 2x represent: the former is the function x plus the function x^2 , and the latter is the function x multiplied by the scalar 2. So this is just a different way of looking at something you are already familiar with. It turns out that V and its rules for addition and multiplication by a scalar satisfy the same key properties as does the set of vectors in Rn^ with its addition and scalar multiplication. We refer to a set with an addition and scalar multiplication which behave appropriately as a vector space. We now give the formal definition of a vector space.
Definition 2.4 (Vector space) A vector space V is a set equipped with an addi- tion operation and a scalar multiplication operation such that for all α, β ∈ R and all u, v, w ∈ V ,
Other properties follows from those listed in the definition. For instance, we can see that 0x = 0 for all x, as follows:
0 x = (0 + 0)x = 0x + 0x,
so, adding the negative − 0 x of 0x to each side,
0 = 0x + (− 0 x) = (0x + 0x) + (− 0 x) = 0x + (0x + (− 0 x)) = 0x + 0 = 0x.
(A bit sneaky, but just remember the result: 0x = 0 .)
(Note that this definition says nothing at all about ‘multiplying’ together two vectors: the only operations with which the definition is concerned are addition and scalar multiplication.)
A vector space as we have defined it is often called a real vector space, to emphasise that the ‘scalars’ α, β and so on are real numbers rather than complex numbers. There is a notion of complex vector space, but this will not concern us in this subject.
Theorem 2.3 Suppose V is a vector space. Then a non-empty subset W of V is a subspace if and only if:
Null space
Suppose that A is an m × n matrix. Then the null space N (A), the set of solutions to the linear system Ax = 0 , is a subspace of Rn.
Theorem 2.4 For any m × n matrix A, N (A) is a subspace of Rn.
Proof To prove this we have to verify that N (A) 6 = ∅, and that if u, v ∈ N (A) and α ∈ R, then u + v ∈ N (A) and αu ∈ N (A). Since A 0 = 0 , 0 ∈ N (A) and hence N (A) 6 = emptyset. Suppose u, v ∈ N (A). Then to show u + v ∈ N (A) and αu ∈ N (A), we must show that u + v and αu are solutions of Ax = 0. We have
A(u + v) = Au + Av = 0 + 0 = 0
and A(αu) = α(Au) = α 0 = 0 ,
so we have shown what we needed.
Note that the null space is the set of solutions to the homogeneous linear system. If we instead consider the set of solutions S to a general system Ax = b, S is not a subspace of Rn^ if b 6 = 0 (that is, if the system is not homogeneous). This is because 0 does not belong to S. However, as we indicated above, there is a relationship between S and N (A): if x 0 is any solution of Ax = b then S = {x 0 + z : z ∈ N (A)}, which we may write as x 0 + N (A). Generally, if W is a subspace of a vector space V and x ∈ V then the set x + W defined by
x + W = {x + w : w ∈ W }
is called an affine subspace of V. An affine subspace is not generally a subspace (although every subspace is an affine subspace, as we can see by taking x = 0 ).
Range
Recall that the range of an m × n matrix is
R(A) = {Ax : x ∈ Rn}.
Theorem 2.5 For any m × n matrix A, R(A) is a subspace of Rm.
Proof We need to show that if u, v ∈ R(A) then u + v ∈ R(A) and, for any α ∈ R, αv ∈ R(A). So suppose u, v ∈ R(A). Then for some y 1 , y 2 ∈ Rn, u = Ay 1 , v = Ay 2. We need to show that u + v = Ay for some y. Well,
u + v = Ay 1 + Ay 2 = A(y 1 + y 2 ),
so we may take y = y 1 + y 2 to see that, indeed, u + v ∈ R(A). Next,
αv = α(Ay 1 ) = A(αy 1 ),
so αv = Ay for some y (namely y = αy 1 ) and hence αv ∈ R(A).
Linear independence is a central idea in the theory of vector spaces. We say that vectors x 1 , x 2 ,... , xm in Rn^ are linearly dependent (LD) if there are numbers α 1 , α 2 ,... , αm, not all zero, such that
α 1 x 1 + α 2 x 2 + · · · + αmxm = 0 ,
the zero vector. The left-hand side is termed a non-trivial linear combination. This condition is entirely equivalent to saying that one of the vectors may be expressed as a linear combination of the others. The vectors are linearly independent (LI) if they are not linearly dependent; that is, if no non-trivial linear combination of them is the zero vector or, equivalently, whenever
α 1 x 1 + α 2 x 2 + · · · + αmxm = 0 ,
then, necessarily, α 1 = α 2 = · · · = αm = 0. We have been talking about Rn, but the same definitions can be used for any vector space. We state them formally now.
Definition 2.6 (Linear independence) Let V be a vector space and v 1 ,... , vm ∈ V. Then v 1 , v 2 ,... , vm form a linearly independent set or are linearly inde- pendent if and only if
α 1 v 1 + α 2 v 2 + · · · + αmvm = 0 =⇒ α 1 = α 2 = · · · = αm = 0 :
that is, if and only if no non-trivial linear combination of v 1 , v 2 ,... , vm equals the zero vector.
Definition 2.7 (Linear dependence) Let V be a vector space and v 1 , v 2 ,... , vm ∈ V. Then v 1 , v 2 ,... , vm form a linearly dependent set or are linearly dependent if and only if there are real numbers α 1 , α 2 ,... , αm, not all zero, such that
α 1 v 1 + α 2 v 2 + · · · + αmvm = 0 ;
that is, if and only if some non-trivial linear combination of the vectors is the zero vector.
Example: In R^3 , the following vectors are linearly dependent:
v 1 =
(^) , v 2 =
(^) , v 3 =
This is because 2 v 1 + v 2 − v 3 = 0.
(Note that this can also be written as v 3 = 2v 1 + v 2 .)
The set of all linear combinations of a given set of vectors forms a vector space, and we give it a special name.
Definition 2.8 Suppose that V is a vector space and that v 1 , v 2 ,... , vk ∈ V. Let S be the set of all linear combinations of the vectors v 1 ,... , vk. That is,
S = {α 1 v 1 + · · · + αkvk : α 1 , α 2 ,... , αk ∈ R}.
Then S is a subspace of V , and is known as the subspace spanned by the set X = {v 1 ,... , vk} (or, the linear span or, simply, span, of v 1 , v 2 ,... , vk). This subspace is denoted by S = Lin{v 1 , v 2 ,... , vk}
or S = Lin(X).
Different texts use different notations. For example, Simon and Blume use L[v 1 , v 2 ,... , vn]. Notation is important, but it is nothing to get anxious about: just always make it clear what you mean by your notation: use words as well as symbols!
We have already observed that the range R(A) of an m × n matrix A is equal to the set of all linear combinations of its columns. In other words, R(A) is the span of the columns of A and is often called the column space. It is also possible to consider the row space RS(A) of a matrix: this is the span of the rows of A. If A is an m × n matrix the row space will be a subspace of Rn.
The following result is very important in the theory of vector spaces.
Theorem 2.9 If x 1 , x 2 ,... , xn are linearly independent vectors in Rn, then for any x in Rn, x can be written as a linear combination of x 1 ,... , xn. We say that x 1 , x 2 ,... , xn span Rn.
Proof Because x 1 ,... , xn are linearly independent, the n × n matrix
A = (x 1 x 2... xn)
is such that rank(A) = n. (See above.) In other words, A reduces to an echelon matrix with exactly n leading ones. Suppose now that x is any vector in Rn^ and consider the system Az = x. By the discussion above about the rank of linear systems, this system has a (unique) solution. But let’s spell it out. Because A has rank n, it can be reduced (by row operations) to an echelon matrix with n leading ones. The augmented matrix (Ax) can therefore be reduced to a matrix (Es) where E is an n × n echelon matrix with n leading ones. Solving by back-substitution in the usual manner, we can find a solution to Az = x. This shows that any vector x can be expressed in the form
x = Az = (x 1 x 2... xn)
α 1 α 2 .. . αn
where we have written z as (α 1 , α 2 ,... , αn)T^. Expanding this matrix product, we have that any x ∈ Rn^ can be expressed as a linear combination
x = α 1 x 1 + α 2 x 2 + · · · + αnxn,
as required.
There is another important property of linearly independent sets of vectors.
Theorem 2.10 If x 1 , x 2 ,... , xm are linearly independent in Rn^ and
c 1 x 1 + c 2 x 2 + · · · + cmxm = c′ 1 x 1 + c′ 2 x 2 + · · · + c′ mxm
then c 1 = c′ 1 , c 2 = c′ 2 ,... , cm = c′ m.
Activity 2.5 Prove this. Use the fact that
c 1 x 1 + c 2 x 2 + · · · + cmxm = c′ 1 x 1 + c′ 2 x 2 + · · · + c′ mxm
if and only if
(c 1 − c′ 1 )x 1 + (c 2 − c′ 2 )x 2 + · · · + (cm − c′ m)xm = 0.
It follows from these two results that if we have n linearly independent vectors in Rn, then any vector in Rn^ can be expressed in exactly one way as a linear combination of the n vectors. We say that the n vectors form a basis of Rn. The formal definition of a (finite) basis for a vector space is as follows.
Definition 2.9 ((Finite) Basis) Let V be a vector space. Then the subset B = {v 1 , v 2 ,... , vn} of V is said to be a basis for (or of ) V if:
An alternative characterisation of a basis can be given: B is a basis of V if every vector in V can be expressed in exactly one way as a linear combination of the vectors in B.
Example: The vector space Rn^ has the basis {e 1 , e 2 ,... , en} where ei is (as earlier) the vector with every entry equal to 0 except for the ith entry, which is 1. It’s clear that the vectors are linearly independent, and there are n of them, so we know straight away that they form a basis. In fact, it’s easy to see that they span the whole of Rn, since for any x = (x 1 , x 2 ,... , xn)T^ ∈ Rn,
x = x 1 e 1 + x 2 e 2 + · · · + xnen.
The basis {e 1 , e 2 ,... , en} is called the standard basis of Rn.
Activity 2.6 Convince yourself that the vectors are linearly independent and that they span the whole of Rn.
basis of W then there are s = dim(V ) − dim(W ) vectors v 1 , v 2 ,... , vs ∈ V such that {w 1 , w 2 ,... , wr , v 1 , v 2 ,... , vs} is a basis of V. (In the case W = V , the basis of W is already a basis of V .) That is, we can obtain a basis of the whole space V by adding vectors of V to any basis of W.
Suppose we are given m vectors x 1 , x 2... , xm in Rn, and we want to find a basis for the linear span Lin{x 1 ,... , xm}. The point is that the m vectors themselves might not form a linearly independent set (and hence not a basis). A useful technique is to form a matrix with the xTi as rows, and to perform row operations until the resulting matrix is in echelon form. Then a basis of the linear span is given by the transposed non-zero rows of the echelon matrix (which, it should be noted, will not generally be among the initial given vectors). The reason this works is that: (i) row operations are such that at any stage in the resulting procedure, the row space of the matrix is equal to the row space of the original matrix, which is precisely the linear span of the original set of vectors (if we ignore the difference between row and column vectors), and (ii) the non-zero rows of an echelon matrix are linearly independent (which is clear, since each has a one in a position where the others all have zero).
Example: We find a basis for the subspace of R^5 spanned by the vectors
x 1 =
, x 2 =
, x 3 =
, x 4 =
The matrix (^)
xT 1 xT 2 xT 3 xT 4
is (^)
Reducing this to echelon form by elementary row operations,
The echelon matrix at the end of this tells us that a basis for Lin{x 1 , x 2 , x 3 , x 4 } is formed from the first two rows, transposed, of the echelon matrix:
If we really want a basis that consists of some of the original vectors, then all we need to do is take those vectors that ‘correspond’ to the final non-zero rows in the echelon matrix. By this, we mean the rows of the original matrix that have ended up as non-zero in the echelon matrix. For instance, in the example just given, the first and second rows of the original matrix correspond to the non-zero rows of the echelon matrix, so a basis of the span is {x 1 , x 2 }. On the other hand, if we interchange rows, the correspondence won’t be so obvious. If, for example, in reduction to echelon form, we end up with the top two rows of the echelon matrix being non-zero, but have at some stage performed a single ‘interchange’ operation, swapping rows 2 and 3 (without swapping any others), then it is the first and third rows of the original matrix that we should take as our basis.
As we have seen, the range and null space of an m × n matrix are subspaces of Rm and Rn^ (respectively). Their dimensions are so important that they are given special names.
Definition 2.10 (Rank and Nullity) The rank of a matrix A is
rank(A) = dim(R(A))
and the nullity is nullity(A) = dim(N (A)).
We have, of course, already used the word ‘rank’, so it had better be the case that the usage just given coincides with the earlier one. Fortunately it does. In fact, we have the following connection.
Theorem 2.14 Suppose that A is an m × n matrix with columns c 1 , c 2 ,... , cn, and that an echelon form obtained from A has leading ones in columns i 1 , i 2 ,... , ir. Then a basis for R(A) is B = {ci 1 , ci 2 ,... , cir }.
Note that the basis is formed from columns of A, not columns of the echelon matrix: the basis consists of those columns corresponding to the leading ones in the echelon matrix.
Example: Suppose that (as in an earlier example in this chapter),
Earlier, we reduced this to echelon form using elementary row operations, obtaining the echelon matrix (^)
The leading ones in this echelon matrix are in the first and second columns, so a basis for R(A) can be obtained by taking the first and second columns of A. (Note: