Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Linear Regression: Learning from Data using Cost Functions and Gradient Descent, Exams of Computer Science

University of Utah (The U)Computer Science

The concept of linear regression, focusing on the use of cost functions and gradient descent to find the optimal parameters (θ0 and θ1) for a linear prediction model. The least-squares cost function, its optimization using gradient descent, and the derivation of the closed-form solution. Additionally, it introduces the maximum likelihood formulation and the concept of overfitting.

Typology: Exams

Pre 2010

Uploaded on 08/30/2009

koofers-user-r9d-1 🇺🇸

10 documents

1 / 4

This page cannot be seen from the preview

Don't miss anything!

Machine Learning (CS 5350/CS 6350) 16 Jan 2006

Linear models for regression

Our old simple (perhaps unrealistic) regression example:

Square footage Price

1200 $120

1340 $125

1390 $105

1400 $130

1420 $135

1500 $145

1550 $160

1700 $155

1900 $140

2150 $130

2300 $135

Linear prediction model:

price ≈θ0+θ1×square footage (1)

We can write this as:

[θ1θ0]







1200 1

1340 1

1390 1

1400 1

1420 1

1500 1

1550 1

1700 1

1900 1

2150 1

2300 1







>

=







120

125

105

130

135

145

160

155

140

130

135







(2)

It is easy to check that no {θ0, θ1}exists that satisfies this.

Define a cost function, least-squares:

J(θ) = 1

2

N

X

n=1 θ>xn−yn2

This cost function penalizes outliers.

Now, we’ve changed the learning problem to an optimization problem: find θto minimize J(θ).

Gradient Descent

Iteratively update θaccording to:

1

Discover Exams of Computer Science University of Utah (The U)

Partial preview of the text

Download Linear Regression: Learning from Data using Cost Functions and Gradient Descent and more Exams Computer Science in PDF only on Docsity!

Machine Learning (CS 5350/CS 6350) 16 Jan 2006

Linear models for regression

Our old simple (perhaps unrealistic) regression example:

Square footage Price

Linear prediction model:

price ≈ θ 0 + θ 1 × square footage (1)

We can write this as:

[θ 1 θ 0 ]

It is easy to check that no {θ 0 , θ 1 } exists that satisfies this.

Define a cost function, least-squares:

J(θ) =

N ∑

n=

[

θ

xn − yn

] 2

This cost function penalizes outliers.

Now, we’ve changed the learning problem to an optimization problem: find θ to minimize J(θ).

Gradient Descent

Iteratively update θ according to:

θ

(t+1) = θ

(t) − α

∂θ

J(θ)

For the least-squares cost function, the partial is:

∂θ

J(θ) =

N ∑

n=

[

θ

xn − yn

]

xn

The gradient is big on examples for which there is a high error.

α is a learning rate. Too low −→ slow convergence, too high −→ no convergence.

It turns out that we can actually obtain a solution in closed form. Let X be the data matrix, let Y be a

(column) vector containing the targets. Then Xθ − Y is a column vector whose nth element is θ

xn − yn.

So:

J(θ) =

[Xθ − Y ]

Then, we can compute the gradient:

∇θ J(θ) = ∇θ

[Xθ − Y ]

∇θ

[

θ

X

Xθ − θ

X

Y − Y

Xθ + Y

Y

]

∇θ tr

[

θ

X

Xθ − θ

X

Y − Y

Xθ + Y

Y

]

∇θ

[

tr θ

X

Xθ − 2 tr Y

Xθ

]

[

X

]− 1

X

Y

Maximum Likelihood

An alternative formulation: y = θ

x + , where ∼ Nor(0, σ

2 ). Then y ∼ Nor(θ

x, σ

2 ). Now, find θ to

maximize likelihood of the training set.

This is an ` 2 penalty. λ controls how complex functions we allow.

Easy to compute gradient:

∇θ J(θ) = ∇θ

[Xθ − Y ]

[Xθ − Y ] +

λ

θ

= X

Xθ − X

Y + λθ

So we can solve for θ:

(X

X + λI)θ = X

Y

=⇒θ = [X

X + λI]

− 1 X

Y

This is especially nice when X

X is illconditioned.

We can also do a probabilistic interpretation, putting a prior on θ: θ ∼ Nor(0, λ

− 1 ).

In general, too many features is bad, too few is bad. Why? We want to minimize the expected cost (going

back to un-regularized). Suppose f = f (x) and t = f + . Write y for θ

x. Then:

2 n ] − E[yntn] + E[ynfn]

= E[

2 ] + E

[

(fn − yn)

2

]

]

2E [(E[yn] − yn)(fn − yn)]

= E[

2 ] + E

[

(fn − E[yn])

2

]

+ E

[

(E[yn] − yn)

2

]

= V[noise] + bias

2

V[y]

Linear Regression: Learning from Data using Cost Functions and Gradient Descent, Exams of Computer Science

Related documents

Partial preview of the text

Download Linear Regression: Learning from Data using Cost Functions and Gradient Descent and more Exams Computer Science in PDF only on Docsity!

Linear models for regression

[

] 2

[

]

[

]

[

]

[

]

[

X

]

= X

X

[

X

]− 1

X

= X

(X

[

]

E

[

]

E

[

]

= E

[

]

= E

[

]

+ E

[

]

= E

[

]

+ E

[

]

= E[

[

]

AND

E

[

]

= E

[

]

= E

[

]

+ E

[

]

= E

[

]

+ E

[

]

E

[

]

= E[

[

]

+ E

[

]

= E[

= E[