Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Vector, Matrix, and Tensor Derivatives, Lecture notes of Calculus

Massachusetts Institute of Technology (MIT)Calculus

How to take derivatives of vectors, matrices, and higher order tensors. It emphasizes the importance of simplifying the calculation by expanding notation into explicit sums and equations for each component and removing summation notation. It provides an example of computing the derivative of a column vector with respect to another column vector. useful for students studying calculus, linear algebra, and multivariable calculus.

Typology: Lecture notes

2021/2022

Uploaded on 05/11/2023

christina 🇺🇸

4.6

(23)

393 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

Vector, Matrix, and Tensor Derivatives

Erik Learned-Miller

The purpose of this document is to help you learn to take derivatives of vectors, matrices,

and higher order tensors (arrays with three dimensions or more), and to help you take

derivatives with respect to vectors, matrices, and higher order tensors.

1 Simplify, simplify, simplify

Much of the confusion in taking derivatives involving arrays stems from trying to do too

many things at once. These “things” include taking derivatives of multiple components

simultaneously, taking derivatives in the presence of summation notation, and applying the

chain rule. By doing all of these things at the same time, we are more likely to make errors,

at least until we have a lot of experience.

1.1 Expanding notation into explicit sums and equations for each

component

In order to simplify a given calculation, it is often useful to write out the explicit formula for

a single scalar element of the output in terms of nothing but scalar variables. Once one has

an explicit formula for a single scalar element of the output in terms of other scalar values,

then one can use the calculus that you used as a beginner, which is much easier than trying

to do matrix math, summations, and derivatives all at the same time.

Example. Suppose we have a column vector ~y of length Cthat is calculated by forming

the product of a matrix Wthat is Crows by Dcolumns with a column vector ~x of length

D:

~y =W ~x. (1)

Suppose we are interested in the derivative of ~y with respect to ~x. A full characterization

of this derivative requires the (partial) derivatives of each component of ~y with respect to each

component of ~x, which in this case will contain C×Dvalues since there are Ccomponents

in ~y and Dcomponents of ~x.

Let’s start by computing one of these, say, the 3rd component of ~y with respect to the

7th component of ~x. That is, we want to compute

∂~y3

∂~x7

,

1

Discover Lecture notes of Calculus Massachusetts Institute of Technology (MIT)

Partial preview of the text

Download Vector, Matrix, and Tensor Derivatives and more Lecture notes Calculus in PDF only on Docsity!

Vector, Matrix, and Tensor Derivatives

Erik Learned-Miller

The purpose of this document is to help you learn to take derivatives of vectors, matrices, and higher order tensors (arrays with three dimensions or more), and to help you take derivatives with respect to vectors, matrices, and higher order tensors.

1 Simplify, simplify, simplify

Much of the confusion in taking derivatives involving arrays stems from trying to do too many things at once. These “things” include taking derivatives of multiple components simultaneously, taking derivatives in the presence of summation notation, and applying the chain rule. By doing all of these things at the same time, we are more likely to make errors, at least until we have a lot of experience.

1.1 Expanding notation into explicit sums and equations for each

component

In order to simplify a given calculation, it is often useful to write out the explicit formula for a single scalar element of the output in terms of nothing but scalar variables. Once one has an explicit formula for a single scalar element of the output in terms of other scalar values, then one can use the calculus that you used as a beginner, which is much easier than trying to do matrix math, summations, and derivatives all at the same time.

Example. Suppose we have a column vector ~y of length C that is calculated by forming the product of a matrix W that is C rows by D columns with a column vector ~x of length D: ~y = W ~x. (1) Suppose we are interested in the derivative of ~y with respect to ~x. A full characterization of this derivative requires the (partial) derivatives of each component of ~y with respect to each component of ~x, which in this case will contain C × D values since there are C components in ~y and D components of ~x. Let’s start by computing one of these, say, the 3rd component of ~y with respect to the 7th component of ~x. That is, we want to compute

∂~y 3 ∂~x 7

which is just the derivative of one scalar with respect to another. The first thing to do is to write down the formula for computing ~y 3 so we can take its derivative. From the definition of matrix-vector multiplication, the value ~y 3 is computed by taking the dot product between the 3rd row of W and the vector ~x:

~y 3 =

∑^ D

j=

W 3 ,j ~xj. (2)

At this point, we have reduced the original matrix equation (Equation 1) to a scalar equation. This makes it much easier to compute the desired derivatives.

1.2 Removing summation notation

While it is certainly possible to compute derivatives directly from Equation 2, people fre- quently make errors when differentiating expressions that contain summation notation (

or product notation (

). When you’re beginning, it is sometimes useful to write out a computation without any summation notation to make sure you’re doing everything right. Using “1” as the first index, we have:

~y 3 = W 3 , 1 ~x 1 + W 3 , 2 ~x 2 + ... + W 3 , 7 ~x 7 + ... + W 3 ,D~xD.

Of course, I have explicitly included the term that involves ~x 7 , since that is what we are differenting with respect to. At this point, we can see that the expression for y 3 only depends upon ~x 7 through a single term, W 3 , 7 ~x 7. Since none of the other terms in the summation include ~x 7 , their derivatives with respect to ~x 7 are all 0. Thus, we have

∂~y 3 ∂~x 7

∂~x 7

[W 3 , 1 ~x 1 + W 3 , 2 ~x 2 + ... + W 3 , 7 ~x 7 + ... + W 3 ,D~xD] (3)

∂~x 7

[W 3 , 7 ~x 7 ] + ... + 0 (4)

∂~x 7

[W 3 , 7 ~x 7 ] (5)

= W 3 , 7. (6)

By focusing on one component of ~y and one component of ~x, we have made the calculation about as simple as it can be. In the future, when you are confused, it can help to try to reduce a problem to this most basic setting to see where you are going wrong.

1.2.1 Completing the derivative: the Jacobian matrix

Recall that our original goal was to compute the derivatives of each component of ~y with respect to each component of ~x, and we noted that there would be C × D of these. They

2.1 Example 2

Let ~y be a row vector with C components computed by taking the product of another row vector ~x with D components and a matrix W that is D rows by C columns.

~y = ~xW.

Importantly, despite the fact that ~y and ~x have the same number of components as before, the shape of W is the transpose of the shape that we used before for W. In particular, since we are now left-multiplying by ~x, whereas before ~x was on the right, W must be transposed for the matrix algebra to work. In this case, you will see, by writing

~y 3 =

∑^ D

j=

~xj Wj, 3

that ∂~y 3 ∂~x 7

= W 7 , 3.

Notice that the indexing into W is the opposite from what it was in the first example. However, when we assemble the full Jacobian matrix, we can still see that in this case as well, d~y d~x

= W. (7)

3 Dealing with more than two dimensions

Let’s consider another closely related problem, that of computing

d~y dW

In this case, ~y varies along one coordinate while W varies along two coordinates. Thus, the entire derivative is most naturally contained in a three-dimensional array. We avoid the term “three-dimensional matrix” since it is not clear how matrix multiplication and other matrix operations are defined on a three-dimensional array. Dealing with three-dimensional arrays, it becomes perhaps more trouble than it’s worth to try to find a way to display them. Instead, we should simply define our results as formulas which can be used to compute the result on any element of the desired three dimensional array. Let’s again compute a scalar derivative between one component of ~y, say ~y 3 and one component of W , say W 7 , 8. Let’s start with the same basic setup in which we write down an equation for ~y 3 in terms of other scalar components. Now we would like an equation that expresses ~y 3 in terms of scalar values, and shows the role that W 7 , 8 plays in its computation.

However, what we see is that W 7 , 8 plays no role in the computation of ~y 3 , since

~y 3 = ~x 1 W 1 , 3 + ~x 2 W 2 , 3 + ... + ~xDWD, 3. (8)

In other words, ∂~y 3 ∂W 7 , 8

However, the partials of ~y 3 with respect to elements of the 3rd column of W will certainly be non-zero. For example, the derivative of ~y 3 with respect to W 2 , 3 is given by

∂~y 3 ∂W 2 , 3

= ~x 2 , (9)

as can be easily seen by examining Equation 8. In general, when the index of the ~y component is equal to the second index of W , the derivative will be non-zero, but will be zero otherwise. We can write:

∂~yj ∂Wi,j

= ~xi,

but the other elements of the 3-d array will be 0. If we let F represent the 3d array representing the derivative of ~y with respect to W , where

Fi,j,k =

∂~yi ∂Wj,k

then Fi,j,i = ~xj ,

but all other entries of F are zero. Finally, if we define a new two-dimensional array G as

Gi,j = Fi,j,i

we can see that all of the information we need about F can be stored in G, and that the non-trivial portion of F is really two-dimensional, not three-dimensional. Representing the important part of derivative arrays in a compact way is critical to efficient implementations of neural networks.

4 Multiple data points

It is a good exercise to repeat some of the previous examples, but using multiple examples of ~x, stacked together to form a matrix X. Let’s assume that each individual ~x is a row vector of length D, and that X is a two-dimensional array with N rows and D columns. W , as in our last example, will be a matrix with D rows and C columns. Y , given by

Y = XW,

Let us define the intermediate result

m ~ = W ~x.

Then we have that ~y = V ~m.

We can then write, using the chain rule, that

d~y d~x

d~y d ~m

d ~m d~x

To make sure that we know exactly what this means, let’s take the old approach of analyzing one component at a time, starting with a single component of ~y and a single component of ~x: d~yi d~xj

d~yi d ~m

d ~m d~xj

But how exactly should we interpret the product on the right? The idea with the chain rule is to multiply the change in ~yi with respect to each scalar intermediate variable by the change in the scalar intermediate variable with respect to ~xj. In particular, if m~ has M components, then we write

d~yi d~xj

∑^ M

k=

d~yi d ~mk

d ~mk d~xj

Recall from our previous results about derivatives of a vector with respect to a vector that d~yi d ~mk

is just Vi,k and that d ~mk d~xj

is just Wk,j. So we can write

d~yi d~xj

∑^ M

k=

Vi,kWk,j ,

which is just the component expression for V W , our original answer to the problem. To summarize, we can use the chain rule in the setting of vector and matrix derivatives by

Clearly stating intermediate results and the variables used to represent them,
Expressing the chain rule for individual components of the final derivatives,
Summing appropriately over the intermediate results within the chain rule expression.

Vector, Matrix, and Tensor Derivatives, Lecture notes of Calculus

Related documents

Partial preview of the text

Download Vector, Matrix, and Tensor Derivatives and more Lecture notes Calculus in PDF only on Docsity!

Vector, Matrix, and Tensor Derivatives

Erik Learned-Miller

1 Simplify, simplify, simplify

1.1 Expanding notation into explicit sums and equations for each

component

∑^ D

1.2 Removing summation notation

2.1 Example 2

∑^ D

= W 7 , 3.

= W. (7)

∑^ M

∑^ M