









Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Def: A linear transformation is a function T : Rn → Rm which satisfies: ... every linear transformation come from matrix-vector multiplication? Yes:.
Typology: Lecture notes
1 / 15
This page cannot be seen from the preview
Don't miss anything!










The two basic vector operations are addition and scaling. From this perspec- tive, the nicest functions are those which “preserve” these operations:
Def: A linear transformation is a function T : Rn^ → Rm^ which satisfies: (1) T (x + y) = T (x) + T (y) for all x, y ∈ Rn (2) T (cx) = cT (x) for all x ∈ Rn^ and c ∈ R.
Fact: If T : Rn^ → Rm^ is a linear transformation, then T ( 0 ) = 0.
We’ve already met examples of linear transformations. Namely: if A is any m × n matrix, then the function T : Rn^ → Rm^ which is matrix-vector multiplication T (x) = Ax
is a linear transformation.
(Wait: I thought matrices were functions? Technically, no. Matrices are lit- erally just arrays of numbers. However, matrices define functions by matrix- vector multiplication, and such functions are always linear transformations.)
Question: Are these all the linear transformations there are? That is, does every linear transformation come from matrix-vector multiplication? Yes:
Prop 13.2: Let T : Rn^ → Rm^ be a linear transformation. Then the function T is just matrix-vector multiplication: T (x) = Ax for some matrix A. In fact, the m × n matrix A is
T (e 1 ) · · · T (en)
Terminology: For linear transformations T : Rn^ → Rm, we use the word “kernel” to mean “nullspace.” We also say “image of T ” to mean “range of T .” So, for a linear transformation T : Rn^ → Rm:
ker(T ) = {x ∈ Rn^ | T (x) = 0 } = T −^1 ({ 0 }) im(T ) = {T (x) | x ∈ Rn} = T (Rn).
Ways to Visualize functions f : R → R (e.g.: f (x) = x^2 )
(1) Set-Theoretic Picture.
(2) Graph of f. (Thinking: y = f (x).) The graph of f : R → R is the subset of R^2 given by:
Graph(f ) = {(x, y) ∈ R^2 | y = f (x)}.
(3) Level sets of f. (Thinking: f (x) = c.) The level sets of f : R → R are the subsets of R of the form
{x ∈ R | f (x) = c},
for constants c ∈ R.
Ways to Visualize functions f : R^2 → R (e.g.: f (x, y) = x^2 + y^2 )
(1) Set-Theoretic Picture.
(2) Graph of f. (Thinking: z = f (x, y).) The graph of f : R^2 → R is the subset of R^3 given by:
Graph(f ) = {(x, y, z) ∈ R^3 | z = f (x, y)}.
(3) Level sets of f. (Thinking: f (x, y) = c.) The level sets of f : R^2 → R are the subsets of R^2 of the form
{(x, y) ∈ R^2 | f (x, y) = c},
for constants c ∈ R.
Ways to Visualize functions f : R^3 → R (e.g.: f (x, y, z) = x^2 + y^2 + z^2 )
(1) Set-Theoretic Picture. (2) Graph of f. (Thinking: w = f (x, y, z).) (3) Level sets of f. (Thinking: f (x, y, z) = c.) The level sets of f : R^3 → R are the subsets of R^3 of the form
{(x, y, z) ∈ R^3 | f (x, y, z) = c},
for constants c ∈ R.
(1) Diagonal Matrices: A diagonal matrix is a matrix of the form
d 1 0 · · · 0 0 d 2 · · · 0 ... ...... (^0) 0 0 · · · dn
The linear transformation defined by D has the following effect: Vectors are... ◦ Stretched/contracted (possibly reflected) in the x 1 -direction by d 1 ◦ Stretched/contracted (possibly reflected) in the x 2 -direction by d 2 ... ◦ Stretched/contracted (possibly reflected) in the xn-direction by dn.
◦ Stretching in the xi-direction happens if |di| > 1. ◦ Contracting in the xi-direction happens if |di| < 1. ◦ Reflecting happens if di is negative.
(2) Rotations in R^2
We write Rotθ : R^2 → R^2 for the linear transformation which rotates vectors in R^2 counter-clockwise through the angle θ. Its matrix is: [ cos θ − sin θ sin θ cos θ
Example: Let F : R^2 → R^3 be the function
F (x, y) = (x + 2y, sin(x), ey) = (F 1 (x, y), F 2 (x, y), F 3 (x, y)).
Its derivative is a linear transformation DF (x, y) : R^2 → R^3. The matrix of the linear transformation DF (x, y) is:
DF (x, y) =
∂F 1 ∂x
∂F 1 ∂y ∂F 2 ∂x
∂F 2 ∂y ∂F 3 ∂x
∂F 3 ∂y
cos(x) 0 0 ey
Notice that (for example) DF (1, 1) is a linear transformation, as is DF (2, 3), etc. That is, each DF (x, y) is a linear transformation R^2 → R^3.
Single Variable Setting
Review: In single-variable calc, we look at functions f : R → R. We write y = f (x), and at a point (a, f (a)) write:
∆y ≈ dy.
Here, ∆y = f (x) − f (a), while dy = f ′(a)∆x = f ′(a)(x − a). So:
f (x) − f (a) ≈ f ′(a)(x − a).
Therefore: f (x) ≈ f (a) + f ′(a)(x − a).
The right-hand side f (a) + f ′(a)(x − a) can be interpreted as follows: ◦ It is the best linear approximation to f (x) at x = a. ◦ It is the 1st Taylor polynomial to f (x) at x = a. ◦ The line y = f (a) + f ′(a)(x − a) is the tangent line at (a, f (a)).
Multivariable Setting
Now consider functions f : Rn^ → Rm. At a point (a, f (a)), we have exactly the same thing: f (x) − f (a) ≈ Df (a)(x − a).
That is: f (x) ≈ f (a) + Df (a)(x − a). (∗)
Note: The quantity Df (a) is a matrix, while (x − a) is a vector. That is, Df (a)(x − a) is matrix-vector multiplication.
Example: Let f : R^2 → R. Let’s write x = (x 1 , x 2 ) and a = (a 1 , a 2 ). Then (∗) reads:
f (x 1 , x 2 ) ≈ f (a 1 , a 2 ) +
∂f ∂x 1 (a^1 , a^2 )^
∂f ∂x 2 (a^1 , a^2 )
] [x 1 −^ a 1 x 2 − a 2
= f (a 1 , a 2 ) +
∂f ∂x 1 (a 1 , a 2 )(x 1 − a 1 ) +
∂f ∂x 2 (a 1 , a 2 )(x 2 − a 2 ).
Recall: Let f : X → Y and g : Y → Z be functions. Their composition is the function g ◦ f : X → Z defined by
(g ◦ f ) = g(f (x)).
Observations: (1) For this to make sense, we must have: co-domain(f ) = domain(g). (2) Composition is not generally commutative: that is, f ◦ g and g ◦ f are usually different. (3) Composition is always associative: (h ◦ g) ◦ f = h ◦ (g ◦ f ).
Fact: If T : Rk^ → Rn^ and S : Rn^ → Rm^ are both linear transformations, then S ◦ T is also a linear transformation.
Question: How can we describe the matrix of the linear transformation S ◦T in terms of the matrices of S and T?
Fact: Let T : Rn^ → Rn^ and S : Rn^ → Rm^ be linear transformations with matrices B and A, respectively. Then the matrix of S ◦ T is the product AB.
We can multiply an m × n matrix A by an n × k matrix B. The result, AB, will be an m × k matrix:
(m × n)(n × k) → (m × k).
Notice that n appears twice here to “cancel out.” That is, we need the number of rows of A to equal the number of columns of B – otherwise, the product AB makes no sense.
Example 1: Let A be a (3 × 2)-matrix, and let B be a (2 × 4)-matrix. The product AB is then a (3 × 4)-matrix.
Example 2: Let A be a (2 × 3)-matrix, and let B be a (4 × 2)-matrix. Then AB is not defined. (But the product BA is defined: it is a (4 × 3)-matrix.)
Example 1A (Elliptic Paraboloid): Consider f : R^2 → R given by
f (x, y) = x^2 + y^2.
The level sets of f are curves in R^2. The level sets are {(x, y) | x^2 + y^2 = c}. The graph of f is a surface in R^3. The graph is {(x, y, z) | z = x^2 + y^2 }.
Notice that (0, 0 , 0) is a local minimum of f.
Note that ∂f∂x (0, 0) = ∂f∂y (0, 0) = 0. Also, ∂
(^2) f ∂x^2 (0,^ 0)^ >^ 0 and^
∂^2 f ∂y^2 (0,^ 0)^ >^ 0.
Example 1B (Elliptic Paraboloid): Consider f : R^2 → R given by
f (x, y) = −x^2 − y^2.
The level sets of f are curves in R^2. The level sets are {(x, y) | −x^2 − y^2 = c}. The graph of f is a surface in R^3. The graph is {(x, y, z) | z = −x^2 − y^2 }.
Notice that (0, 0 , 0) is a local maximum of f. Note that ∂f∂x (0, 0) = ∂f∂y (0, 0) = 0. Also, ∂
(^2) f ∂x^2 (0,^ 0)^ <^ 0 and^
∂^2 f ∂y^2 (0,^ 0)^ <^ 0.
Example 2 (Hyperbolic Paraboloid): Consider f : R^2 → R given by
f (x, y) = x^2 − y^2.
The level sets of f are curves in R^2. The level sets are {(x, y) | x^2 − y^2 = c}. The graph of f is a surface in R^3. The graph is {(x, y, z) | z = x^2 − y^2 }.
Notice that (0, 0 , 0) is a saddle point of the graph of f.
Note that ∂f∂x (0, 0) = ∂f∂y (0, 0) = 0. Also, ∂
(^2) f ∂x^2 (0,^ 0)^ >^ 0 while^
∂^2 f ∂y^2 (0,^ 0)^ <^ 0.
General Remark: In each case, the level sets of f are obtained by slicing the graph of f by planes z = c. Try to visualize this in each case.
Def: For a function f : Rn^ → R, its directional derivative in the direction v at the point x ∈ Rn^ is:
Dvf (x) = ∇f (x) · v.
Here, · is the dot product of vectors. Therefore,
Dvf (x) = ‖∇f (x)‖‖v‖ cos θ, where θ = ](∇f (x), v).
Usually, we assume that v is a unit vector, meaning ‖v‖ = 1.
Example: Let f : R^2 → R. Let v =
a b
. Then:
Dvf (x, y) = ∇f (x, y) ·
a b
[∂f ∂f∂x ∂y
a b
= a ∂f ∂x
In particular, we have two important special cases:
De 1 f (x, y) = ∇f (x, y) ·
∂f ∂x
De 2 f (x, y) = ∇f (x, y) ·
∂f ∂y
Point: Partial derivatives are themselves examples of directional derivatives!
Namely, ∂f∂x is the directional derivative of f in the e 1 -direction, while ∂f∂y is the directional derivative in the e 2 -direction.
Question: In which direction v will the function f grow the most? That is, for which unit vector v is Dvf maximized?
Theorem 6.3: (a) The directional derivative Dvf (a) is maximized when v points in the same direction as ∇f (a). (b) The directional derivative Dvf (a) is minimized when v points in the opposite direction as ∇f (a).
In fact: The maximum and minimum values of Dvf (a) at the point a ∈ Rn are ‖∇f (a)‖ and −‖∇f (a)‖. (Assuming we only care about unit vectors v.)
Recall: For a function F : Rn^ → R, its gradient is the vector in Rn^ given by:
∇F =
∂x 1
∂x 2
∂xn
There are two ways to think about the gradient. They are interrelated.
Gradient: Normal to Level Sets
Theorem: Consider a level set F (x 1 ,... , xn) = c of a function F : Rn^ → R. If (a 1 ,... , an) is a point on the level set, then ∇F (a 1 ,... , an) is normal to the level set.
Example: If we have a level curve F (x, y) = c in R^2 , the gradient vector ∇F (x 0 , y 0 ) is a normal vector to the level curve at the point (x 0 , y 0 ).
Example: If we have a level surface F (x, y, z) = c in R^3 , the gradient vector ∇F (x 0 , y 0 , z 0 ) is a normal vector to the level surface at the point (x 0 , y 0 , z 0 ).
Normal vectors help us find tangent planes to level sets. (see handout “Tangent Lines/Planes...”) But there’s another reason we like normal vectors.
Gradient: Direction of Steepest Ascent
Observation: A normal vector to a level set F (x 1 ,... , xn) = c in Rn^ is the direction of steepest ascent for the graph z = F (x 1 ,... , xn) in Rn+1.
Example (Elliptic Paraboloid): Let f : R^2 → R be f (x, y) = 2x^2 + 3y^2. The level sets of f are the ellipses 2x^2 + 3y^2 = c in R^2. The graph of f is the elliptic paraboloid z = 2x^2 + 3y^2 in R^3.
At the point (1, 1) ∈ R^2 , the gradient vector ∇f (1, 1) =
is normal to
the level curve 2x^2 +3y^2 = 5. So, if we were hiking on the surface z = 2x^2 +3y^2 in R^3 and were at the point (1, 1 , f (1, 1)) = (1, 1 , 5), to ascend the surface
the fastest, we would hike in the direction of
Warning: Note that ∇f is normal to the level sets of f. It is not a normal vector to the graph of f.
Question: Which linear transformations T : Rn^ → Rm^ are invertible? (Equiv: Which m × n matrices A are invertible?)
Fact: If T : Rn^ → Rm^ is invertible, then m = n. So: If an m × n matrix A is invertible, then m = n.
In other words, non-square matrices are never invertible. But square ma- trices may or may not be invertible. Which ones are invertible? Well:
Theorem: Let A be an n × n matrix. The following are equivalent: (i) A is invertible (ii) N (A) = { 0 } (iii) C(A) = Rn (iv) rref(A) = In (v) det(A) 6 = 0.
To Repeat: An n × n matrix A is invertible if and only if for every b ∈ Rn, the equation Ax = b has exactly one solution x ∈ Rn. In this case, the solution to the equation Ax = b is given by x = A−^1 b.
Q: How can we find inverse matrices? This is accomplished via:
Prop 16.7: If A is an invertible matrix, then rref[A | In] = [In | A−^1 ].
Useful Formula: Let A =
[ a b c d
] be a 2×2 matrix. If A is invertible (det(A) =
ad − bc 6 = 0), then:
A−^1 =
ad − bc
d −b −c a
Prop 16.8: Let f : X → Y and g : Y → Z be invertible functions. Then: (a) f −^1 is invertible and (f −^1 )−^1 = f. (b) g ◦ f is invertible and (g ◦ f )−^1 = f −^1 ◦ g−^1.
Corollary: Let A, B be invertible n × n matrices. Then: (a) A−^1 is invertible and (A−^1 )−^1 = A. (b) AB is invertible and (AB)−^1 = B−^1 A−^1.
There are two reasons that determinants are important: (1) Algebra: Determinants tell us whether a matrix is invertible or not. (2) Geometry: Determinants are related to area and volume.
Prop 17.3: An n × n matrix A is invertible ⇐⇒ det(A) 6 = 0. Moreover: if A is invertible, then
det(A−^1 ) =
det(A)
Properties of Determinants (17.2, 17.4): (1) (Multiplicativity) det(AB) = det(A) det(B). (2) (Alternation) Exchanging two rows of a matrix reverses the sign of the determinant. (3) (Multilinearity): First:
det
a 1 a 2 · · · an c 21 c 22 · · · c 2 n ... ...... ... cn 1 cn 2 · · · cnn
+ det
b 1 b 2 · · · bn c 21 c 22 · · · c 2 n ... ...... ... cn 1 cn 2 · · · cnn
= det
a 1 + b 1 a 2 + b 2 · · · an + bn c 21 c 22 · · · c 2 n ... ...... ... cn 1 cn 2 · · · cnn
and similarly for the other rows; Second:
det
ka 11 ka 12 · · · ka 1 n a 21 a 22 · · · a 2 n ... ...... ... an 1 an 2 · · · ann
=^ k^ det
a 11 a 12 · · · a 1 n a 21 a 22 · · · a 2 n ... ...... ... an 1 an 2 · · · ann
and similarly for the other rows. Here, k ∈ R is any scalar.
Warning! Multilinearity does not say that det(A + B) = det(A) + det(B). It also does not say det(kA) = k det(A). But: det(kA) = kn^ det(A) is true.
Prop 17.5: Let A be any 2 × 2 matrix. Then the area of the parallelogram generated by the columns of A is |det(A)|.
Prop 17.6: Let T : R^2 → R^2 be a linear transformation with matrix A. Let R be a region in R^2. Then:
Area(T (R)) = |det(A)| · Area(R).