Docsity
Docsity

Prepara tus exámenes
Prepara tus exámenes

Prepara tus exámenes y mejora tus resultados gracias a la gran cantidad de recursos disponibles en Docsity


Consigue puntos base para descargar
Consigue puntos base para descargar

Gana puntos ayudando a otros estudiantes o consíguelos activando un Plan Premium


Orientación Universidad
Orientación Universidad


Maching learning -data mining, Apuntes de Minería de Datos

uso de modelos de data mining para realizar pronosticos

Tipo: Apuntes

2017/2018

Subido el 03/08/2018

julio_cess-1
julio_cess-1 🇪🇸

2.5

(2)

4 documentos

1 / 103

Toggle sidebar

Esta página no es visible en la vista previa

¡No te pierdas las partes importantes!

bg1
Regression for Data Mining
Mgt. 2206 – Introduction to Analytics
Matthew Liberatore
Thomas Coghlan
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26
pf27
pf28
pf29
pf2a
pf2b
pf2c
pf2d
pf2e
pf2f
pf30
pf31
pf32
pf33
pf34
pf35
pf36
pf37
pf38
pf39
pf3a
pf3b
pf3c
pf3d
pf3e
pf3f
pf40
pf41
pf42
pf43
pf44
pf45
pf46
pf47
pf48
pf49
pf4a
pf4b
pf4c
pf4d
pf4e
pf4f
pf50
pf51
pf52
pf53
pf54
pf55
pf56
pf57
pf58
pf59
pf5a
pf5b
pf5c
pf5d
pf5e
pf5f
pf60
pf61
pf62
pf63
pf64

Vista previa parcial del texto

¡Descarga Maching learning -data mining y más Apuntes en PDF de Minería de Datos solo en Docsity!

Regression for Data Mining

Mgt. 2206 – Introduction to Analytics

Matthew Liberatore

Thomas Coghlan

Learning Objectives

To understand the application of regression

analysis in data mining

Linear/nonlinear

Logistic (Logit)

To understand the key statistical measures of

fit

To learn how to run and interpret regression

analyses using SAS Enterprise Miner

software

Linear Regression Analysis

Analysis of the strength of the linear

relationship between predictor

(independent) variables and outcome

(dependent/criterion) variables.

In two dimensions (one predictor, one

outcome variable) data can be plotted on a

scatter diagram.

E

E (

( y

y ) =

) = 

0

0





1

1

(x))

(x))

Expected value of

y (outcome)

Intercept

Term

coefficient

Predictor

variable

Estimation Process

Regression Model

Regression Model

y

y

= 

00

11

x

x

Regression Equation

Regression Equation

E

E (

( y

y ) =

) = 

00

11

x

x

Unknown Parameters

Unknown Parameters

0

0

,

, 

1

1

Sample Data:

Sample Data:

x y

x y

x

x

11

y

y

11

.. .. .. ..

x

x

n

n

y

y

n

n

b

b

0

0

and

and b

b

1

1

provide estimates of

provide estimates of

00

and

and 

11

Estimated

Estimated

Regression Equation

Regression Equation

Sample Statistics

Sample Statistics

b

b

0

0

,

, b

b

1

1

0 1

ˆ

ybb x

E

E (

( y

y ): Outcome

): Outcome

x: Predictor

x: Predictor

Slope

Slope

1

1

is negative

is negative

Regression line

Regression line

Intercept

Intercept

0

0

Simple Linear Regression Equation:

Negative Linear Relationship

Negative Linear Relationship

E

E (

( y

y ): Outcome

): Outcome

x: Predictor

x: Predictor

Simple Linear Regression Equation:

No Relationship

No Relationship

E

E (

( y

y ): Outcome

): Outcome

x: Predictor

x: Predictor

Intercept

Intercept

0

0

Simple Linear Regression Equation:

Parabolic Relationship

Parabolic Relationship

•••••••••••••••••••••••

Example

List Variables we have

Determine a DV of interest

Is there a way to predict DV?

Slope for the Estimated Regression

Equation

1

2

( )( )

( )

i i

i

x x y y

b

x x

 

1

2

( )( )

( )

i i

i

x x y y

b

x x

 

Least Squares Method

y

y

Intercept for the Estimated Regression Equation

Intercept for the Estimated Regression Equation

Least Squares Method

Least Squares Method

0 1

b  y  b x

0 1

b  y  b x

where:

where:

x

x

ii

=

= value of independent variable for

value of independent variable for i

i th

th

observation

observation

n

n

= total number of observations

total number of observations

_

_

y

y = mean value for dependent variable

= mean value for dependent variable

_

_

x

x = mean value for independent variable

= mean value for independent variable

y

y

i

i

=

= value of dependent variable for

value of dependent variable for i

i th

th

observation

observation

Example: Kwatts vs. Temp

  • 59.2 9, Temp Kwatts
  • 61.9 9,
  • 55.1 10,
  • 66.2 10,
  • 52.1 10,
  • 69.9 11,
  • 46.8 12,
  • 76.8 13,
  • 79.7 15,
  • 79.3 15,
  • 80.2 17,
  • 83.3 17,

Is the Relationship Linear?

KWatts vs. Temp

0

2,

4,

6,

8,

10,

12,

14,

16,

18,

20,

40 45 50 55 60 65 70 75 80 85 90

Temp

KWatts

KWatts

Coefficient of Determination

How “strong” is relationship between predictor &

outcome? (Fraction of observed variance of

outcome variable explained by the predictor

variables).

Relationship Among SST, SSR, SSE

where:

where:

SST = total sum of squares

SST = total sum of squares

SSR = sum of squares due to regression

SSR = sum of squares due to regression

SSE = sum of squares due to error

SSE = sum of squares due to error

SST = SSR + SSE

SST = SSR + SSE

2

( )

i

yy

2

( )

i

yy

2

ˆ

( )

i

yy

2

ˆ

( )

i

yy

2

ˆ

( )

i i

yy

2

ˆ

( )

i i

yy

Coefficient of Determination (

Coefficient of Determination ( r

r

2

2

)

)

where:

where:

SSR = sum of squares due to regression

SSR = sum of squares due to regression

SST = total sum of squares

SST = total sum of squares

r

r

22

= SSR/SST

= SSR/SST