




























































































Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
In this chapter we will review relevant notions from linear algebra and multivariable calculus that will figure into our discussion of computational ...
Typology: Exams
1 / 219
This page cannot be seen from the preview
Don't miss anything!





























































































In this chapter we will review relevant notions from linear algebra and multivariable calculus that will figure into our discussion of computational techniques. It is intended as a review of back- ground material with a bias toward ideas and interpretations commonly encountered in practice; the chapter safely can be skipped or used as reference by students with stronger background in mathematics.
0.1 Preliminaries: Numbers and Sets
Rather than considering algebraic (and at times philosophical) discussions like “What is a num- ber?,” we will rely on intuition and mathematical common sense to define a few sets:
It is worth acknowledging that our definition of R is far from rigorous. The construction of the real numbers can be an important topic for practitioners of cryptography techniques that make use of alternative number systems, but these intricacies are irrelevant for the discussion at hand. As with any other sets, N , Z , Q , R , and C can be manipulated using generic operations to generate new sets of numbers. In particular, recall that we can define the “Euclidean product” of two sets A and B as A × B = {(a, b) : a ∈ A and b ∈ B}.
We can take powers of sets by writing
An^ = (^) ︸A × A × · · · ×︷︷ A︸ n times
(^1) This is the first of many times that we will use the notation {A : B}; the braces should denote a set and the colon can be read as “such that.” For instance, the definition of Q can be read as “the set of fractions a/b such that a and b are integers.” As a second example, we could write N = {n ∈ Z : n > 0 }.
Suppose we start with vectors ~v 1 ,... ,~vk ∈ V for vector space V. By Definition 0.1, we have two ways to start with these vectors and construct new elements of V: addition and scalar multiplica- tion. The idea of span is that it describes all of the vectors you can reach via these two operations:
Definition 0.2 (Span). The span of a set S ⊆ V of vectors is the set
span S ≡ {a 1 ~v 1 + · · · + ak~vk : k ≥ 0, vi ∈ V for all i, and ai ∈ R for all i}.
Notice that span S is a subspace of V, that is, a subset of V that is in itself a vector space. We can provide a few examples:
Example 0.3 (Mixology). The typical “well” at a cocktail bar contains at least four ingredients at the bartender’s disposal: vodka, tequila, orange juice, and grenadine. Assuming we have this simple well, we can represent drinks as points in R^4 , with one slot for each ingredient. For instance, a typical “tequila sunrise” can be represented using the point (0, 1.5, 6, 0.75), representing amounts of vodka, tequila, orange juice, and grenadine (in ounces), resp. The set of drinks that can be made with the typical well is contained in
span {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 1, 0), (0, 0, 0, 1)},
that is, all combinations of the four basic ingredients. A bartender looking to save time, however, might no- tice that many drinks have the same orange juice to grenadine ratio and mix the bottles. The new simplified well may be easier for pouring but can make fundamentally fewer drinks:
span {(1, 0, 0, 0), (0, 1, 0, 0), (0, 0, 6, 0.75)}
Example 0.4 (Polynomials). Define the pk(x) ≡ xk. Then, it is easy to see that
R [x] = span {pk : k ≥ 0 }.
Make sure you understand notation well enough to see why this is the case.
Adding another item to a set of vectors does not always increase the size of its span. For instance, in R^2 it is clearly the case that
span {(1, 0), (0, 1)} = span {(1, 0), (0, 1), (1, 1)}.
In this case, we say that the set {(1, 0), (0, 1), (1, 1)} is linearly dependent:
Definition 0.3 (Linear dependence). We provide three equivalent definitions. A set S ⊆ V of vectors is linearly dependent if:
If S is not linearly dependent, then we say it is linearly independent. Providing proof or informal evidence that each definition is equivalent to its counterparts (in an “if and only if” fashion) is a worthwhile exercise for students less comfortable with notation and abstract mathematics. The concept of linear dependence leads to an idea of “redundancy” in a set of vectors. In this sense, it is natural to ask how large a set we can choose before adding another vector cannot possibly increase the span. In particular, suppose we have a linearly independent set S ⊆ V, and now we choose an additional vector ~v ∈ V. Adding ~v to S leads to one of two possible outcomes:
Definition 0.4 (Dimension and basis). The dimension of V is the maximal size |S| of a linearly- independent set S ⊂ V such that span S = V. Any set S satisfying this property is called a basis for V. Example 0.5 ( R n). The standard basis for R n^ is the set of vectors of the form
~ek ≡ (0,... , 0 ︸ ︷︷ ︸ k− 1 slots
n−k slots
That is, ~ek has all zeros except for a single one in the k-th slot. It is clear that these vectors are linearly independent and form a basis; for example in R^3 any vector (a, b, c) can be written as a~e 1 + b~e 2 + c~e 3. Thus, the dimension of R n^ is n, as we would expect. Example 0.6 (Polynomials). It is clear that the set {1, x, x^2 , x^3 ,.. .} is a linearly independent set of poly- nomials spanning R [x]. Notice that this set is infinitely large, and thus the dimension of R [x] is ∞.
Of particular importance for our purposes is the vector space R n, the so-called n-dimensional Eu- clidean space. This is nothing more than the set of coordinate axes encountered in high school math classes:
(a 1 ,... , an) ≡
a 1 a 2 .. . an
Aside 0.1. There are many theoretical questions to ponder here, some of which we will address in future chapters when they are more motivated:
Intrigued students can consult texts on real and functional analysis.
0.3 Linearity
A function between vector spaces that preserves structure is known as a linear function:
Definition 0.7 (Linearity). Suppose V and V′^ are vector spaces. Then, L : V → V ′^ is linear if it satisfies the following two criteria for all ~v,~v 1 ,~v 2 ∈ V and c ∈ R :
It is easy to generate linear maps between vector spaces, as we can see in the following examples:
Example 0.8 (Linearity in R n). The following map f : R^2 → R^3 is linear:
f (x, y) = ( 3 x, 2x + y, −y)
We can check linearity as follows:
f (x 1 + x 2 , y 1 + y 2 ) = ( 3 (x 1 + x 2 ), 2(x 1 + x 2 ) + (y 1 + y 2 ), −(y 1 + y 2 )) = ( 3 x 1 , 2x 1 + y 1 , −y 1 ) + ( 3 x 2 , 2x 2 + y 2 , −y 2 ) = f (x 1 , y 1 ) + f (x 2 , y 2 )
f (cx, cy) = ( 3 cx, 2cx + cy, −cy) = c( 3 x, 2x + y, −y) = c f (x, y)
Contrastingly, g(x, y) ≡ xy^2 is not linear. For instance, g(1, 1) = 1 but g(2, 2) = 8 6 = 2 · g(1, 1), so this form does not preserve scalar products.
Example 0.9 (Integration). The following “functional” L from R [x] to R is linear:
L[p(x)] ≡
∫ (^1)
0
p(x) dx.
This somewhat more abstract example maps polynomials p(x) to real numbers L[p(x)]. For example, we can write
L[ 3 x^2 + x − 1 ] =
∫ (^1)
0
( 3 x^2 + x − 1 ) dx =
Linearity comes from the following well-known facts from calculus:
∫ (^1)
0
c · f (x) dx = c
∫ (^1)
0
f (x) dx ∫ (^1)
0
[ f (x) + g(x)] dx =
∫ (^1)
0
f (x) dx +
∫ (^1)
0
g(x) dx
We can write a particularly nice form for linear maps on R n. Recall that the vector ~a = (a 1 ,... , an) is equal to the sum (^) ∑k ak~ek, where ~ek is the k-th standard basis vector. Then, if L is linear we know:
L[~a] = L
∑ k
ak~ek
for the standard basis ~ek
= (^) ∑ k
L [ak~ek] by sum preservation
= (^) ∑ k
akL [~ek] by scalar product preservation
This derivation shows the following important fact: L is completely determined by its action on the standard basis vectors ~ek. That is, for any vector ~a ∈ R n, we can use the sum above to determine L[~a] by linearly combining L[~e 1 ],... , L[~en].
Example 0.10 (Expanding a linear map). Recall the map in Example 0.8 given by f (x, y) = ( 3 x, 2x + y, −y). We have f (~e 1 ) = f (1, 0) = (3, 2, 0) and f (~e 2 ) = f (0, 1) = (0, 1, − 1 ). Thus, the formula above shows:
f (x, y) = x f (~e 1 ) + y f (~e 2 ) = x
(^) + y
The expansion of linear maps above suggests one of many contexts in which it is useful to store multiple vectors in the same structure. More generally, say we have n vectors ~v 1 ,... ,~vn ∈ R m. We can write each as a column vector:
~v 1 =
v 11 v 21 .. . vm 1
,~v 2 =
v 12 v 22 .. . vm 2
, · · · ,~vn =
v 1 n v 2 n .. . vmn
Example 0.14 (Mixology). Continuing Example 0.3, suppose we make a tequila sunrise and second con- coction with equal parts of the two liquors in our simplified well. To find out how much of the basic in- gredients are contained in each order, we could combine the recipes for each column-wise and use matrix multiplication:
Well 1 Well 2 Well 3
Vodka 1 0 0 Tequila 0 1 0 OJ 0 0 6 Grenadine 0 0 0.
Drink 1 Drink 2 ( (^0) 0.75 ) 1.5 0. 1 2
Drink 1 Drink 2
0 0.75 Vodka 1.5 0.75 Tequila 6 12 OJ 0.75 1.5 Grenadine
In general, we will use capital letters to represent matrices, like A ∈ R m×n. We will use the notation Aij ∈ R to denote the element of A at row i and column j.
It comes as no surprise that we can write a scalar as a 1 × 1 vector c ∈ R^1 ×^1. Similar, as we already suggested in §0.2.3, if we write vectors in R n^ in column form, they can be considered n × 1 matrices ~v ∈ R n×^1. Notice that matrix-vector products can be interpreted easily in this context; for example, if A ∈ R m×n, ~x ∈ R n, and~b ∈ R m, then we can write expressions like
m×n
︸︷︷︸^ ~x n× 1
= (^) ︸︷︷︸~b m× 1
We will introduce one additional operator on matrices that is useful in this context:
Definition 0.8 (Transpose). The transpose of a matrix A ∈ R m×n^ is a matrix A>^ ∈ R n×m^ with elements (A>)ij = Aji.
Example 0.15 (Transposition). The transpose of the matrix
is given by
A>^ =
Geometrically, we can think of transposition as flipping a matrix on its diagonal.
This unified treatment of scalars, vectors, and matrices combined with operations like trans- position and multiplication can lead to slick derivations of well-known identities. For instance,
we can compute the dot products of vectors ~a,~b ∈ R n^ by making the following series of steps:
~a ·~b =
n ∑ k= 1
akbk
a 1 a 2 · · · an
b 1 b 2 .. . bn
= ~a>~b
Many important identities from linear algebra can be derived by chaining together these opera- tions with a few rules:
Example 0.16 (Residual norm). Suppose we have a matrix A and two vectors ~x and ~b. If we wish to know how well A~x approximates ~b, we might define a residual ~r ≡ ~b − A~x; this residual is zero exactly when A~x = ~b. Otherwise, we might use the norm ‖~r‖ 2 as a proxy for the relationship between A~x and ~b. We can use the identities above to simplify:
‖~r‖^22 = ‖~b − A~x‖^22 = (~b − A~x) · (~b − A~x) as explained in §0.2. = (~b − A~x)>(~b − A~x) by our expression for the dot product above = (~b>^ − ~x>^ A>)(~b − A~x) by properties of transposition = ~b>~b −~b>^ A~x − ~x>^ A>~b + ~x>^ A>^ A~x after multiplication
All four terms on the right hand side are scalars, or equivalently 1 × 1 matrices. Scalars thought of as matrices trivially enjoy one additional nice property c>^ = c, since there is nothing to transpose! Thus, we can write
~x>^ A>~b = (~x>^ A>~b)>^ = ~b>^ A~x
This allows us to simplify our expression even more:
‖~r‖^22 = ~b>~b − 2 ~b>^ A~x + ~x>^ A>^ A~x = ‖A~x‖^22 − 2 ~b>^ A~x + ‖~b‖^22
We could have derived this expression using dot product identities, but intermediate steps above will prove useful in our later discussion.