Machine learning ensemble methods Machine learning ensemble methods, Thesis of Mathematical Methods

Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods Machine learning ensemble methods

Typology: Thesis

2021/2022

Uploaded on 02/24/2023

yasmin-jwabreh
yasmin-jwabreh ๐Ÿ‡ต๐Ÿ‡ธ

5 documents

1 / 31

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
ENCS5341
Machine Learning and Data Science
Regression
Yaz an Ab u F arh a -Birzeit University
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f

Partial preview of the text

Download Machine learning ensemble methods Machine learning ensemble methods and more Thesis Mathematical Methods in PDF only on Docsity!

ENCS

Machine Learning and Data Science

Regression

Yazan Abu Farha - Birzeit University

Introduction

  • Regression is a supervised learning task where the target variable that we are trying to predict is continuous. Examples: predicting houses prices based on the living area, predicting stock price based on the history of previous prices.
  • When there is a single input variable (x), the method is referred to as simple linear regression. E.g.: predicting blood pressure as a function of drug dose.
  • When there are multiple input variables, literature from statistics often refers to the method as multiple linear regression. E.g.: predicting crop yields as a function of fertilizer and water.
  • Linear regression is a model that assumes a linear relationship between the input variables (x) and the single output variable (y). More specifically, that y can be calculated from a linear combination of the input variables (x)

Linear regression example with more than one variable

  • Now assume that we have two features: living areas, and the number of bedrooms
  • In this case, our linear regression model will have the form y = f( x ) = w 0 + w 1 x 1 + w 2 x 2 where y is the predicted house price, x 1 is the first feature (living area), x 2 is the second feature (number of bedrooms), and x = (x 1 ,x 2 ) T is the input features

Linear regression

For x in โ„, linear regression fits a line in a 2 - dimensional space (simple linear regression) For x in โ„^2 , linear regression fits a plane in a 3 - dimensional space (multiple linear regression)

Prediction with linear regression model

  • Example: Hours studying and grades We want to learn w 0 and w 1 such that Predicted final grade in class = w 0 + w 1 *(#hours you study/week)
  • Assume after learning we have: Predicted final grade in class = 59.95 + 3.17*(# hours you study/week)
  • We can now use this function to predict grades for new #hours Ex: Someone who studies for 12 hours Final grade = 59.95 + (3.17*12) = 97. 2.00 4.00 6.00 8.00 10. Number of hours spent studying

Final grade in course Final grade in course = 59.95 + 3.17 * study R-Square = 0.

Linear regression

  • In general, if we have d features as input x = (x 1 , x 2 , โ€ฆ, xd) T , then the lineart regression would have the following form y = f( x ) = w 0 + w 1 x 1 + w 2 x 2 + โ€ฆ + wdxd
  • To simplify notation, we can augment the input with an extra dimension that has the value 1 x = (x 1 , โ€ฆ , xd) ร  x = (1, x 1 , โ€ฆ , xd)
  • We can now write the linear regression model as follows y = f( x ) = w 0 x 0 + w 1 x 1 + w 2 x 2 + โ€ฆ + wdxd = โˆ‘!"# $ xi wi = x T ๐’˜

Task Definition

Problem : Given a sample S = {( x 1 , y 1 ), โ€ฆ, ( x n , yn)} โІ โ„

d

ร— โ„, find a vector w โˆˆ โ„

d

such

that

best interpolates S.

" best interpolates โ€: for ( x , y) we measure the discrepancy between f( x ) and y by the

square loss function

E(f( x ), y) = (f( x ) โ€“ y)

2

Linear regression solution โ€“ simple case

  • Letโ€™s first consider the solution for the simple linear regression case. I.e., the input is only one variable x.
  • Given a sent of n training examples: (x 1 ,y 1 ), โ€ฆ , (xn,yn), we want to learn w 0 and w 1 such that f(x) = y = w 0 + w 1 x
  • The solution is found by minimizing the sum of squared errors: argmin %!,%"

!"# ' ๐‘ฆ๐‘– โˆ’ ๐‘“ ๐‘ฅ๐‘– 2 argmin %!,%"

!"# ' ๐‘ฆ๐‘– โˆ’ ๐‘ค( โˆ’ ๐‘ค#๐‘ฅ! 2

Linear regression solution โ€“ simple case

  • The solution is found by minimizing the sum of squared errors: argmin %!,%"

!"# ' ๐‘ฆ๐‘– โˆ’ ๐‘ค( โˆ’ ๐‘ค#๐‘ฅ! 2 Find the derivative of the error function E w.r.t. each parameter and set it to 0 )* )%"

) )%"

' ๐‘ฆ๐‘– โˆ’ ๐‘ค( โˆ’ ๐‘ค#๐‘ฅ! 2 ) = โˆ‘!"# ' ) )%"

2 = โˆ‘!"# ' โˆ’ 2 ๐‘ฅ! ๐‘ฆ๐‘– โˆ’ ๐‘ค( โˆ’ ๐‘ค#๐‘ฅ! = โˆ’ 2 โˆ‘!"# ' ๐‘ฆ!๐‘ฅ! โˆ’ ๐‘ค(๐‘ฅ! โˆ’ ๐‘ค#๐‘ฅ! . )* )%" = 0 ร  0 = โˆ‘!"# ' ๐‘ฆ!๐‘ฅ! โˆ’ โˆ‘#$" % ,! '

โˆ‘#$" %

-! '

. ร  ๐‘ค# = โˆ‘#$" % ,#-# / โˆ‘#^ %$^ "'! โˆ‘#^ %$^ "(! % โˆ‘#$" %

  • ) / โˆ‘ #$" % (^) ( ! โˆ‘#$" % (^) ( ! %

Recap - Task Definition Problem : Given a sample S = {( x 1 , y 1 ), โ€ฆ, ( x n , yn)} โІ โ„ d ร— โ„, find a vector w โˆˆ โ„ d such that ๐‘“ ๐’™ = ๐’™ , ๐’˜ best interpolates S. " best interpolates โ€: for ( x , y) we measure the discrepancy between f( x ) and y by the square loss function E(f( x ), y) = (f( x ) โ€“ y) 2 Notion and notation:

  • x โˆˆ โ„d^ is regarded as a column vector, its transpose x T^ as a row vector.
  • X is an n ร— d data matrix (i.e. its i-th row is x iT); y = (y 1 , โ€ฆ, yn)T
  • Inner product of x , z โˆˆ โ„d^ : ๐’™ , ๐’› = x T^ ๐’› = โˆ‘#$% & xi zi
  • Euclidean norm of a vector x โˆˆ โ„d^ : ๐’™ = ๐’™ , ๐’™

Linear Regression โ€“ The normal equations

  • Convex minimization problem: min w E[ w ] = min w $ ,

X w โˆ’ y

2

  • Calculate the gradient:

โˆ‡ w E[ w ] =

. .๐ฐ

$ ,

X w โˆ’ y

2

. .๐ฐ

$ ,

( w

T

X

T

X w โ€“ 2 w

T

X

T

y + y

T

y ) )

$ ,

(2 X

T

X w - 2 X

T

y )

  • Set it to 0: X T

X w = X

T

y

  • And solve the linear system of equations: w = ( X T

X)

- 1

X

T

y

Linear Regression and overfitting

  • Linear regression solution: w = ( X T

X)

- 1

X

T

y

  • High values in w correspond to an overfitting problem.
  • Solution: use regularizer to discourage coefficient from taking large values
    • Penalize the sum of the squares of the coefficients, i.e. ๐ฐ 2
    • Solve min w

      0 X w โˆ’ y 2 + ฮป ๐ฐ 2
    • Solution: w = ( X T X + ฮป Id) - 1 X T y (ฮป is a hyper-parameter and Id is d ร— d identity matrix)
    • This case is called Ridge Regression (ridge regression = Regularised Least Squares)

Probabilistic Interpretation of Linear Regression

  • Our goal is to approximate f by fโ€™
  • The maximum likelihood estimate of fโ€™ is

fโ€™ML = argmax

56

6

= argmax

56

"#$ 7

6

= argmax

56

"#$ 7 $ 012

3 ! " #! $%( &! " ' "

Maximize natural log of this instead โ€ฆ.

f ๐œ– 4 e 4 e (^1) e 3 e 2 e 5

Probabilistic Interpretation of Linear Regression

fโ€™ML = argmax

56

"#$ 7

$ 012

* โˆ’^

$ 0 (^8) + 356 9 +

2

= argmax

56

"#$ 7

$ 0 (^8) + 356 9 +

2

= argmax

56

"#$ 7

2

fโ€™ML = argm๐‘–๐‘›

56

"#$ 7

2

Maximum Likelihood estimate fโ€™ML minimizes the sum of squared errors

f ๐œ– 4 e 4 e (^1) e 3 e 2 e 5