Linear Transformations and Multivariable Calculus: A Comprehensive Guide, Lecture notes of Calculus

Def: A linear transformation is a function T : Rn → Rm which satisfies: ... every linear transformation come from matrix-vector multiplication? Yes:.

Typology: Lecture notes

2022/2023

Uploaded on 03/01/2023

ekaram
ekaram 🇺🇸

4.6

(30)

264 documents

1 / 15

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Linear Transformations
The two basic vector operations are addition and scaling. From this perspec-
tive, the nicest functions are those which “preserve” these operations:
Def: Alinear transformation is a function T:RnRmwhich satisfies:
(1) T(x+y) = T(x) + T(y) for all x,yRn
(2) T(cx) = cT (x) for all xRnand cR.
Fact: If T:RnRmis a linear transformation, then T(0) = 0.
We’ve already met examples of linear transformations. Namely: if Ais
any m×nmatrix, then the function T:RnRmwhich is matrix-vector
multiplication
T(x) = Ax
is a linear transformation.
(Wait: I thought matrices were functions? Technically, no. Matrices are lit-
erally just arrays of numbers. However, matrices define functions by matrix-
vector multiplication, and such functions are always linear transformations.)
Question: Are these all the linear transformations there are? That is, does
every linear transformation come from matrix-vector multiplication? Yes:
Prop 13.2: Let T:RnRmbe a linear transformation. Then the function
Tis just matrix-vector multiplication: T(x) = Axfor some matrix A.
In fact, the m×nmatrix Ais
A=
T(e1)··· T(en)
.
Terminology: For linear transformations T:RnRm, we use the word
“kernel” to mean “nullspace.” We also say “image of T to mean “range of
T.” So, for a linear transformation T:RnRm:
ker(T) = {xRn|T(x) = 0}=T1({0})
im(T) = {T(x)|xRn}=T(Rn).
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff

Partial preview of the text

Download Linear Transformations and Multivariable Calculus: A Comprehensive Guide and more Lecture notes Calculus in PDF only on Docsity!

Linear Transformations

The two basic vector operations are addition and scaling. From this perspec- tive, the nicest functions are those which “preserve” these operations:

Def: A linear transformation is a function T : Rn^ → Rm^ which satisfies: (1) T (x + y) = T (x) + T (y) for all x, y ∈ Rn (2) T (cx) = cT (x) for all x ∈ Rn^ and c ∈ R.

Fact: If T : Rn^ → Rm^ is a linear transformation, then T ( 0 ) = 0.

We’ve already met examples of linear transformations. Namely: if A is any m × n matrix, then the function T : Rn^ → Rm^ which is matrix-vector multiplication T (x) = Ax

is a linear transformation.

(Wait: I thought matrices were functions? Technically, no. Matrices are lit- erally just arrays of numbers. However, matrices define functions by matrix- vector multiplication, and such functions are always linear transformations.)

Question: Are these all the linear transformations there are? That is, does every linear transformation come from matrix-vector multiplication? Yes:

Prop 13.2: Let T : Rn^ → Rm^ be a linear transformation. Then the function T is just matrix-vector multiplication: T (x) = Ax for some matrix A. In fact, the m × n matrix A is

A =

T (e 1 ) · · · T (en)

Terminology: For linear transformations T : Rn^ → Rm, we use the word “kernel” to mean “nullspace.” We also say “image of T ” to mean “range of T .” So, for a linear transformation T : Rn^ → Rm:

ker(T ) = {x ∈ Rn^ | T (x) = 0 } = T −^1 ({ 0 }) im(T ) = {T (x) | x ∈ Rn} = T (Rn).

Ways to Visualize functions f : R → R (e.g.: f (x) = x^2 )

(1) Set-Theoretic Picture.

(2) Graph of f. (Thinking: y = f (x).) The graph of f : R → R is the subset of R^2 given by:

Graph(f ) = {(x, y) ∈ R^2 | y = f (x)}.

(3) Level sets of f. (Thinking: f (x) = c.) The level sets of f : R → R are the subsets of R of the form

{x ∈ R | f (x) = c},

for constants c ∈ R.

Ways to Visualize functions f : R^2 → R (e.g.: f (x, y) = x^2 + y^2 )

(1) Set-Theoretic Picture.

(2) Graph of f. (Thinking: z = f (x, y).) The graph of f : R^2 → R is the subset of R^3 given by:

Graph(f ) = {(x, y, z) ∈ R^3 | z = f (x, y)}.

(3) Level sets of f. (Thinking: f (x, y) = c.) The level sets of f : R^2 → R are the subsets of R^2 of the form

{(x, y) ∈ R^2 | f (x, y) = c},

for constants c ∈ R.

Ways to Visualize functions f : R^3 → R (e.g.: f (x, y, z) = x^2 + y^2 + z^2 )

(1) Set-Theoretic Picture. (2) Graph of f. (Thinking: w = f (x, y, z).) (3) Level sets of f. (Thinking: f (x, y, z) = c.) The level sets of f : R^3 → R are the subsets of R^3 of the form

{(x, y, z) ∈ R^3 | f (x, y, z) = c},

for constants c ∈ R.

Two Examples of Linear Transformations

(1) Diagonal Matrices: A diagonal matrix is a matrix of the form

D =

d 1 0 · · · 0 0 d 2 · · · 0 ... ...... (^0) 0 0 · · · dn

The linear transformation defined by D has the following effect: Vectors are... ◦ Stretched/contracted (possibly reflected) in the x 1 -direction by d 1 ◦ Stretched/contracted (possibly reflected) in the x 2 -direction by d 2 ... ◦ Stretched/contracted (possibly reflected) in the xn-direction by dn.

◦ Stretching in the xi-direction happens if |di| > 1. ◦ Contracting in the xi-direction happens if |di| < 1. ◦ Reflecting happens if di is negative.

(2) Rotations in R^2

We write Rotθ : R^2 → R^2 for the linear transformation which rotates vectors in R^2 counter-clockwise through the angle θ. Its matrix is: [ cos θ − sin θ sin θ cos θ

]

The Multivariable Derivative: An Example

Example: Let F : R^2 → R^3 be the function

F (x, y) = (x + 2y, sin(x), ey) = (F 1 (x, y), F 2 (x, y), F 3 (x, y)).

Its derivative is a linear transformation DF (x, y) : R^2 → R^3. The matrix of the linear transformation DF (x, y) is:

DF (x, y) =

∂F 1 ∂x

∂F 1 ∂y ∂F 2 ∂x

∂F 2 ∂y ∂F 3 ∂x

∂F 3 ∂y

cos(x) 0 0 ey

Notice that (for example) DF (1, 1) is a linear transformation, as is DF (2, 3), etc. That is, each DF (x, y) is a linear transformation R^2 → R^3.

Linear Approximation

Single Variable Setting

Review: In single-variable calc, we look at functions f : R → R. We write y = f (x), and at a point (a, f (a)) write:

∆y ≈ dy.

Here, ∆y = f (x) − f (a), while dy = f ′(a)∆x = f ′(a)(x − a). So:

f (x) − f (a) ≈ f ′(a)(x − a).

Therefore: f (x) ≈ f (a) + f ′(a)(x − a).

The right-hand side f (a) + f ′(a)(x − a) can be interpreted as follows: ◦ It is the best linear approximation to f (x) at x = a. ◦ It is the 1st Taylor polynomial to f (x) at x = a. ◦ The line y = f (a) + f ′(a)(x − a) is the tangent line at (a, f (a)).

Multivariable Setting

Now consider functions f : Rn^ → Rm. At a point (a, f (a)), we have exactly the same thing: f (x) − f (a) ≈ Df (a)(x − a).

That is: f (x) ≈ f (a) + Df (a)(x − a). (∗)

Note: The quantity Df (a) is a matrix, while (x − a) is a vector. That is, Df (a)(x − a) is matrix-vector multiplication.

Example: Let f : R^2 → R. Let’s write x = (x 1 , x 2 ) and a = (a 1 , a 2 ). Then (∗) reads:

f (x 1 , x 2 ) ≈ f (a 1 , a 2 ) +

[

∂f ∂x 1 (a^1 , a^2 )^

∂f ∂x 2 (a^1 , a^2 )

] [x 1 −^ a 1 x 2 − a 2

]

= f (a 1 , a 2 ) +

∂f ∂x 1 (a 1 , a 2 )(x 1 − a 1 ) +

∂f ∂x 2 (a 1 , a 2 )(x 2 − a 2 ).

Composition and Matrix Multiplication

Recall: Let f : X → Y and g : Y → Z be functions. Their composition is the function g ◦ f : X → Z defined by

(g ◦ f ) = g(f (x)).

Observations: (1) For this to make sense, we must have: co-domain(f ) = domain(g). (2) Composition is not generally commutative: that is, f ◦ g and g ◦ f are usually different. (3) Composition is always associative: (h ◦ g) ◦ f = h ◦ (g ◦ f ).

Fact: If T : Rk^ → Rn^ and S : Rn^ → Rm^ are both linear transformations, then S ◦ T is also a linear transformation.

Question: How can we describe the matrix of the linear transformation S ◦T in terms of the matrices of S and T?

Fact: Let T : Rn^ → Rn^ and S : Rn^ → Rm^ be linear transformations with matrices B and A, respectively. Then the matrix of S ◦ T is the product AB.

We can multiply an m × n matrix A by an n × k matrix B. The result, AB, will be an m × k matrix:

(m × n)(n × k) → (m × k).

Notice that n appears twice here to “cancel out.” That is, we need the number of rows of A to equal the number of columns of B – otherwise, the product AB makes no sense.

Example 1: Let A be a (3 × 2)-matrix, and let B be a (2 × 4)-matrix. The product AB is then a (3 × 4)-matrix.

Example 2: Let A be a (2 × 3)-matrix, and let B be a (4 × 2)-matrix. Then AB is not defined. (But the product BA is defined: it is a (4 × 3)-matrix.)

Two Model Examples

Example 1A (Elliptic Paraboloid): Consider f : R^2 → R given by

f (x, y) = x^2 + y^2.

The level sets of f are curves in R^2. The level sets are {(x, y) | x^2 + y^2 = c}. The graph of f is a surface in R^3. The graph is {(x, y, z) | z = x^2 + y^2 }.

Notice that (0, 0 , 0) is a local minimum of f.

Note that ∂f∂x (0, 0) = ∂f∂y (0, 0) = 0. Also, ∂

(^2) f ∂x^2 (0,^ 0)^ >^ 0 and^

∂^2 f ∂y^2 (0,^ 0)^ >^ 0.

Example 1B (Elliptic Paraboloid): Consider f : R^2 → R given by

f (x, y) = −x^2 − y^2.

The level sets of f are curves in R^2. The level sets are {(x, y) | −x^2 − y^2 = c}. The graph of f is a surface in R^3. The graph is {(x, y, z) | z = −x^2 − y^2 }.

Notice that (0, 0 , 0) is a local maximum of f. Note that ∂f∂x (0, 0) = ∂f∂y (0, 0) = 0. Also, ∂

(^2) f ∂x^2 (0,^ 0)^ <^ 0 and^

∂^2 f ∂y^2 (0,^ 0)^ <^ 0.

Example 2 (Hyperbolic Paraboloid): Consider f : R^2 → R given by

f (x, y) = x^2 − y^2.

The level sets of f are curves in R^2. The level sets are {(x, y) | x^2 − y^2 = c}. The graph of f is a surface in R^3. The graph is {(x, y, z) | z = x^2 − y^2 }.

Notice that (0, 0 , 0) is a saddle point of the graph of f.

Note that ∂f∂x (0, 0) = ∂f∂y (0, 0) = 0. Also, ∂

(^2) f ∂x^2 (0,^ 0)^ >^ 0 while^

∂^2 f ∂y^2 (0,^ 0)^ <^ 0.

General Remark: In each case, the level sets of f are obtained by slicing the graph of f by planes z = c. Try to visualize this in each case.

Directional Derivatives

Def: For a function f : Rn^ → R, its directional derivative in the direction v at the point x ∈ Rn^ is:

Dvf (x) = ∇f (x) · v.

Here, · is the dot product of vectors. Therefore,

Dvf (x) = ‖∇f (x)‖‖v‖ cos θ, where θ = ](∇f (x), v).

Usually, we assume that v is a unit vector, meaning ‖v‖ = 1.

Example: Let f : R^2 → R. Let v =

[

a b

]

. Then:

Dvf (x, y) = ∇f (x, y) ·

[

a b

]

[∂f ∂f∂x ∂y

]

[

a b

]

= a ∂f ∂x

  • b ∂f ∂y

In particular, we have two important special cases:

De 1 f (x, y) = ∇f (x, y) ·

[

]

∂f ∂x

De 2 f (x, y) = ∇f (x, y) ·

[

]

∂f ∂y

Point: Partial derivatives are themselves examples of directional derivatives!

Namely, ∂f∂x is the directional derivative of f in the e 1 -direction, while ∂f∂y is the directional derivative in the e 2 -direction.

Question: In which direction v will the function f grow the most? That is, for which unit vector v is Dvf maximized?

Theorem 6.3: (a) The directional derivative Dvf (a) is maximized when v points in the same direction as ∇f (a). (b) The directional derivative Dvf (a) is minimized when v points in the opposite direction as ∇f (a).

In fact: The maximum and minimum values of Dvf (a) at the point a ∈ Rn are ‖∇f (a)‖ and −‖∇f (a)‖. (Assuming we only care about unit vectors v.)

The Gradient: Two Interpretations

Recall: For a function F : Rn^ → R, its gradient is the vector in Rn^ given by:

∇F =

∂F

∂x 1

∂F

∂x 2

∂F

∂xn

There are two ways to think about the gradient. They are interrelated.

Gradient: Normal to Level Sets

Theorem: Consider a level set F (x 1 ,... , xn) = c of a function F : Rn^ → R. If (a 1 ,... , an) is a point on the level set, then ∇F (a 1 ,... , an) is normal to the level set.

Example: If we have a level curve F (x, y) = c in R^2 , the gradient vector ∇F (x 0 , y 0 ) is a normal vector to the level curve at the point (x 0 , y 0 ).

Example: If we have a level surface F (x, y, z) = c in R^3 , the gradient vector ∇F (x 0 , y 0 , z 0 ) is a normal vector to the level surface at the point (x 0 , y 0 , z 0 ).

Normal vectors help us find tangent planes to level sets. (see handout “Tangent Lines/Planes...”) But there’s another reason we like normal vectors.

Gradient: Direction of Steepest Ascent

Observation: A normal vector to a level set F (x 1 ,... , xn) = c in Rn^ is the direction of steepest ascent for the graph z = F (x 1 ,... , xn) in Rn+1.

Example (Elliptic Paraboloid): Let f : R^2 → R be f (x, y) = 2x^2 + 3y^2. The level sets of f are the ellipses 2x^2 + 3y^2 = c in R^2. The graph of f is the elliptic paraboloid z = 2x^2 + 3y^2 in R^3.

At the point (1, 1) ∈ R^2 , the gradient vector ∇f (1, 1) =

[

]

is normal to

the level curve 2x^2 +3y^2 = 5. So, if we were hiking on the surface z = 2x^2 +3y^2 in R^3 and were at the point (1, 1 , f (1, 1)) = (1, 1 , 5), to ascend the surface

the fastest, we would hike in the direction of

[

]

Warning: Note that ∇f is normal to the level sets of f. It is not a normal vector to the graph of f.

Inverses of Linear Transformations

Question: Which linear transformations T : Rn^ → Rm^ are invertible? (Equiv: Which m × n matrices A are invertible?)

Fact: If T : Rn^ → Rm^ is invertible, then m = n. So: If an m × n matrix A is invertible, then m = n.

In other words, non-square matrices are never invertible. But square ma- trices may or may not be invertible. Which ones are invertible? Well:

Theorem: Let A be an n × n matrix. The following are equivalent: (i) A is invertible (ii) N (A) = { 0 } (iii) C(A) = Rn (iv) rref(A) = In (v) det(A) 6 = 0.

To Repeat: An n × n matrix A is invertible if and only if for every b ∈ Rn, the equation Ax = b has exactly one solution x ∈ Rn. In this case, the solution to the equation Ax = b is given by x = A−^1 b.

Q: How can we find inverse matrices? This is accomplished via:

Prop 16.7: If A is an invertible matrix, then rref[A | In] = [In | A−^1 ].

Useful Formula: Let A =

[ a b c d

] be a 2×2 matrix. If A is invertible (det(A) =

ad − bc 6 = 0), then:

A−^1 =

ad − bc

[

d −b −c a

]

Prop 16.8: Let f : X → Y and g : Y → Z be invertible functions. Then: (a) f −^1 is invertible and (f −^1 )−^1 = f. (b) g ◦ f is invertible and (g ◦ f )−^1 = f −^1 ◦ g−^1.

Corollary: Let A, B be invertible n × n matrices. Then: (a) A−^1 is invertible and (A−^1 )−^1 = A. (b) AB is invertible and (AB)−^1 = B−^1 A−^1.

Determinants

There are two reasons that determinants are important: (1) Algebra: Determinants tell us whether a matrix is invertible or not. (2) Geometry: Determinants are related to area and volume.

Determinants: Algebra

Prop 17.3: An n × n matrix A is invertible ⇐⇒ det(A) 6 = 0. Moreover: if A is invertible, then

det(A−^1 ) =

det(A)

Properties of Determinants (17.2, 17.4): (1) (Multiplicativity) det(AB) = det(A) det(B). (2) (Alternation) Exchanging two rows of a matrix reverses the sign of the determinant. (3) (Multilinearity): First:

det

  

a 1 a 2 · · · an c 21 c 22 · · · c 2 n ... ...... ... cn 1 cn 2 · · · cnn

   + det

  

b 1 b 2 · · · bn c 21 c 22 · · · c 2 n ... ...... ... cn 1 cn 2 · · · cnn

   = det

  

a 1 + b 1 a 2 + b 2 · · · an + bn c 21 c 22 · · · c 2 n ... ...... ... cn 1 cn 2 · · · cnn

  

and similarly for the other rows; Second:

det

  

ka 11 ka 12 · · · ka 1 n a 21 a 22 · · · a 2 n ... ...... ... an 1 an 2 · · · ann

   =^ k^ det

  

a 11 a 12 · · · a 1 n a 21 a 22 · · · a 2 n ... ...... ... an 1 an 2 · · · ann

  

and similarly for the other rows. Here, k ∈ R is any scalar.

Warning! Multilinearity does not say that det(A + B) = det(A) + det(B). It also does not say det(kA) = k det(A). But: det(kA) = kn^ det(A) is true.

Determinants: Geometry

Prop 17.5: Let A be any 2 × 2 matrix. Then the area of the parallelogram generated by the columns of A is |det(A)|.

Prop 17.6: Let T : R^2 → R^2 be a linear transformation with matrix A. Let R be a region in R^2. Then:

Area(T (R)) = |det(A)| · Area(R).