Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Matrix Cookbook: Linear Regression and Basis Functions, Lecture notes of Linear Algebra

Bath Spa University Linear Algebra

An in-depth exploration of matrix operations and their applications in linear regression. It covers topics such as matrix notation, transpose, product, inner product, matrix derivatives, and the use of basis functions for non-linear regression. The text also discusses the concepts of structure error and approximation error in the context of function approximation.

Typology: Lecture notes

2021/2022

Uploaded on 09/27/2022

laurinda 🇬🇧

4.8

(8)

220 documents

1 / 6

This page cannot be seen from the preview

Don't miss anything!

1 The Matrix Cookbook

Notation : A vector a=











always denotes a row-vector. With aT= [a1, a2, . . . , ad] we will

denote a column vector.

Transpose :

(A+B)T=AT+BT,(AB)T=BTAT,(A−1)T= (AT)−1=A−T(1)

Product :

(AB)ij =X

AikBki (2)

(AB)C=A(BC),AB 6=BA (3)

Inner product of 2 vectors:

aTb=X

akbk(4)

1.1 Matrix derivatives

Gradient of a function f(x) : Rd→ R:The gradient is defined as row vector:

∂f

∂x=∂f

∂x1

,∂f

∂x2

, . . . , ∂f

∂xd(5)

Chain Rule

∂Z

∂X=∂Z

∂Y

∂X(6)

Product Rule

∂(YZ)

∂X=∂Y

∂XZ+Y∂Z

∂X(7)

Linear derivatives

∂aTx

∂x=∂xTa

∂x=aT(8)

∂Ax

∂x=A,∂xTA

∂x=AT(9)

Quadratic derivatives

∂xTAx

∂x=xTA+xTAT,∂xTx

∂x= 2xT(10)

Discover Lecture notes of Linear Algebra Bath Spa University

Partial preview of the text

Download Matrix Cookbook: Linear Regression and Basis Functions and more Lecture notes Linear Algebra in PDF only on Docsity!

1 The Matrix Cookbook

Notation : A vector a =

a 1 a 2 .. . ad

always denotes a row-vector. With aT^ = [a 1 , a 2 ,... , ad] we will

denote a column vector. Transpose :

(A + B)T^ = AT^ + BT^ , (AB)T^ = BT^ AT^ , (A−^1 )T^ = (AT^ )−^1 = A−T^ (1)

Product :

(AB)ij =

AikBki (2)

(AB)C = A(BC), AB 6 = BA (3)

Inner product of 2 vectors:

aT^ b =

akbk (4)

1.1 Matrix derivatives

Gradient of a function f (x) : Rd^ → R: The gradient is defined as row vector:

∂f ∂x

[

∂f ∂x 1

∂f ∂x 2

∂f ∂xd

]

Chain Rule ∂Z ∂X

∂Z

∂Y

∂X

Product Rule ∂(YZ) ∂X

∂Y

∂X

Z + Y

∂Z

∂X

Linear derivatives

∂aT^ x ∂x

∂xT^ a ∂x = aT^ (8)

∂Ax ∂x

= A, ∂x

T (^) A ∂x =^ A

T (9)

Quadratic derivatives

∂xT^ Ax ∂x

= xT^ A + xT^ AT^ ,

∂xT^ x ∂x

= 2xT^ (10)

Figure 1: Linear Regression: The training data points are given by the blue circles, the original function is plotted by the green line. Based on the knowledge from the data-points we want to find the original function.

2 Linear Regression

We are given a dataset D = 〈xi, yi〉i=1...N (for simplicity we assume that y is a scalar, x is a vector of dimensionality d). We want to find a linear function f (x; w) = w 0 +

∑d k=1 wkxk^ = ˜x T (^) w with

x˜T^ = [1xT^ ] which minimizes the quadratic error function :

E = 1/N

(˜xTi w − yi)^2 (11)

The setting is illustrated in Figure 1. The error function can be easily written in matrix form by noting that a sum over the squared error terms can be represented as the inner product of the error vector

z =

x ˜T 1 w − y 1 x ˜T 2 w − y 2 .. . x˜TN w − yN

= Xw − y (12)

with X =

x ˜T 1 x ˜T 2 .. . x˜TN

and y =

y 1 y 2 .. . yN

E = 1/N zT^ z = 1/N (Xw − y)T^ (Xw − y) (13)

2.1 Least Squares Solution

We now derivate E w.r.t w and set the gradient to 0 T Apply Eq. 6, 10 and 9 :

Figure 3: RBF basis functions

3 Decomposition in Structure and Approximation Error

Definitions : Expected Error: ED [E(D)] Expected error if we sample a data set of given size N and use this data set to fit our function. Here, the expectation has to be done also over all possible data sets of size N! E(D) denotes the error if we use training set D to fit the function (see below). Structure Error: Error that comes solely from the structure of the function used to fit the data. Can be estimated by fitting the function on a very large data set (thus, the approximation error vanishes). Approximation Error: Is given by the variance of the function estimates if we sample different training sets of size N. The more complex functions we use, the higher the variance of our estimate gets! Lets denote f (x; D) a function learned from the a specific dataset D containing N examples and y(x) denote the target function. We will also denote the expected learned function as ED [f (x; D)], which is calculated by taking the expectation with respect to all possible datasets D of size N. The error E(D) for the training set D is given by

E(D) =

(f (x; D)) − y(x))^2 p(x)dx (17)

We now add and substract ED [f (x; D)]

E(D) =

(f (x; D)) − ED [f (x; D)] + ED [f (x; D)] − y(x))^2 p(x)dx =

(f (x; D) − ED [f (x; D)])^2 + (ED [f (x; D)] − y(x))^2 +

2(f (x; D) − ED [f (x; D)])(ED [f (x; D)] − y(x))) p(x)dx

If we now want to calculate the expected error w.r.t all data sets, ED [E(D)], the last line of this equation will vanish, and therefore

ED [E(D)] =

ED

[

(f (x; D) − ED [f (x; D)])^2

]

p(x)dx + ∫

(ED [f (x; D)] − y(x))^2 p(x)dx

The first term corresponds to the approximation error and the last term to the structure error. This decomposition is also known as bias-variance tradeoff.

Figure 4: Structure error (left) and Approximation error (right) for a 1-degree polynomial

Figure 5: Structure error (left) and Approximation error (right) for a 2-degree polynomial

expected loss = variance + bias^2 (18)

When decreasing the bias (which is usually done by increasing the complexity of f ), the variance of our function estimate will usually increase! Note that instead of using many datasets of size N for ED [f (x; D)], we can use a single huge dataset of size M N. This can be easily proofed to be equivalent. In Figure 4, 5 and 6 we can see the structure error (left) and data fits for different data sets of size 10 (right) for a 1-degree, 2-degree and 6-degree polynomial. The gray points illustrate the probability mass of all data sets (these points are also used to estimate the structure error), the blue point illustrate the small data sets used for the single fits. The approximation error is given by the deviation of the single fits from the optimal hypothesis.

Matrix Cookbook: Linear Regression and Basis Functions, Lecture notes of Linear Algebra

Related documents

Partial preview of the text

Download Matrix Cookbook: Linear Regression and Basis Functions and more Lecture notes Linear Algebra in PDF only on Docsity!

1 The Matrix Cookbook

(AB)C = A(BC), AB 6 = BA (3)

1.1 Matrix derivatives

[

]

∂Z

∂Y

∂Y

∂X

∂Y

∂X

Z + Y

∂Z

∂X

T (9)

2 Linear Regression

E = 1/N

2.1 Least Squares Solution

3 Decomposition in Structure and Approximation Error

E(D) =

E(D) =

ED [E(D)] =

ED

[

]