Data Mining - Classification by Backpropagation, undefined for Data Mining. Moradabad Institute of Technology (MIT)

Data Mining

Description: This document about Classification by Backpropagation, Neural Network as a Classifier, A Neuron , A Multi-Layer Feed-Forward Neural Network , How A Multi-Layer Neural Network Works?, Initial input, weight, and bias values .
Showing pages  1  -  2  of  12
The preview of this document ends here! Please or to read the full document or to download it.
Document information
Embed this document:
Classification by Backpropagation

November 24, 2014 Data Mining: Concepts and Techniques 1

Classification by Backpropagation

Backpropagation: A neural network learning algorithm • Started by psychologists and neurobiologists to develop and

test computational analogues of neurons • A neural network: A set of connected input/output units

where each connection has a weight associated with it • During the learning phase, the network learns by

adjusting the weights so as to be able to predict the correct class label of the input tuples

• Also referred to as connectionist learning due to the connections between units

November 24, 2014 Data Mining: Concepts and Techniques 2

Neural Network as a Classifier

Weakness – Long training time – Require a number of parameters typically best determined

empirically, e.g., the network topology or ``structure." – Poor interpretability: Difficult to interpret the symbolic

meaning behind the learned weights and of ``hidden units" in the network

Strength – High tolerance to noisy data – Ability to classify untrained patterns – Well-suited for continuous-valued inputs and outputs – Successful on a wide array of real-world data – Algorithms are inherently parallel – Techniques have recently been developed for the extraction of

rules from trained neural networks

November 24, 2014 Data Mining: Concepts and Techniques 3

A Neuron (= a perceptron)

• The n-dimensional input vector x is mapped into variable y by means of the scalar product and a nonlinear function mapping

k-

f

weighted sum

Input vector x

output y

Activation function

weight vector w

w0

w1

wn

x0

x1

xn

)sign(y

ExampleFor n

0i kiixw  

November 24, 2014 Data Mining: Concepts and Techniques 4

A Multi-Layer Feed-Forward Neural Network

Output layer

Input layer

Hidden layer

Output vector

Input vector: X

wij

  i

jiijj OwI

jIj e O 

 1

1

))(1( jjjjj OTOOErr 

jk k

kjjj wErrOOErr  )1(

ijijij OErrlww )( jjj Errl)(

November 24, 2014 Data Mining: Concepts and Techniques 5

  i

jiijj OwI

jIj e O 

 1

1

))(1( jjjjj OTOOErr 

Given a unit j in a hidden or uotput layer, the net input Ij, to unit j is

Where wij is the weight of the connection from unit i in the previous layer to unit j; Oi is the output of unit i from the previous layer ; and θj is the bias of the unit which acts as a threshold in that it serves to vary the activity of the unit.Given the net input Ij to unit j , then Oj , the output of unit j, is computed as

For a unit j in the output layer, the error Errj is computed by Where Oj is the actual output of unit j, and Tj is the known target value of the given training tuple

jk k

kjjj wErrOOErr  )1( To compute the error of a hidden layer unit j, the

weighted sum of the errors of the units connected to unit j in the next layer are considered. The error

of a hidden layer unit j is Where wjk is the weight of the connection from unit j

to a unit k in the next higher layer, and Errk is the error of unit k.

ijijij OErrlww )(

jjj Errl)(

Weights are updated as follows Where l is the learning rate.

Biases are updated as follows. Where l is the learning.

November 24, 2014 Data Mining: Concepts and Techniques 6

How A Multi-Layer Neural Network Works?

• The inputs to the network correspond to the attributes measured for each training tuple

• Inputs are fed simultaneously into the units making up the input layer

• They are then weighted and fed simultaneously to a hidden layer • The number of hidden layers is arbitrary, although usually only one • The weighted outputs of the last hidden layer are input to units

making up the output layer, which emits the network's prediction • The network is feed-forward in that none of the weights cycles back

to an input unit or to an output unit of a previous layer • From a statistical point of view, networks perform nonlinear

regression: Given enough hidden units and enough training samples, they can closely approximate any function

November 24, 2014 Data Mining: Concepts and Techniques 7

Defining a Network Topology • First decide the network topology: # of units in the

input layer, # of hidden layers (if > 1), # of units in each hidden layer, and # of units in the output layer

• Normalizing the input values for each attribute measured in the training tuples to [0.0—1.0]

• One input unit per domain value, each initialized to 0 • Output, if for classification and more than two

classes, one output unit per class is used • Once a network has been trained and its accuracy is

unacceptable, repeat the training process with a different network topology or a different set of initial weights

November 24, 2014 Data Mining: Concepts and Techniques 8

Backpropagation • Iteratively process a set of training tuples & compare the

network's prediction with the actual known target value • For each training tuple, the weights are modified to minimize

the mean squared error between the network's prediction and the actual target value

• Modifications are made in the “backwards” direction: from the output layer, through each hidden layer down to the first hidden layer, hence “backpropagation

Steps – Initialize weights (to small random #s) and biases in the

network – Propagate the inputs forward (by applying activation function) – Backpropagate the error (by updating weights and biases) – Terminating condition (when error is very small, etc.)

November 24, 2014 Data Mining: Concepts and Techniques 9

Backpropagation and Interpretability

Efficiency of backpropagation: Each epoch (one iteration through the training set) takes O(|D| * w), with |D| tuples and w weights, but # of epochs can be exponential to n, the number of inputs, in the worst case

Rule extraction from networks: network pruning – Simplify the network structure by removing weighted links

that have the least effect on the trained network – Then perform link, unit, or activation value clustering – The set of input and activation values are studied to derive

rules describing the relationship between the input and hidden unit layers

Sensitivity analysis: assess the impact that a given input variable has on a network output. The knowledge gained from this analysis can be represented in rules

November 24, 2014 Data Mining: Concepts and Techniques 10

Sample Calculation for learning by the backpropagation Algorithm

• Fig Shows A multilayer feed-forward Neural Network. Let the learning rate be 0.9. The initial weight and bias values of the N/W are given in Table 1.1 along with the first training tuple, X= ( 1, 0 ,1) whose class label is 1.

1

2

4

3

6

5

x1

w24

W15

x2

x3

w14

w56

w46

w35

w25

w34

November 24, 2014 Data Mining: Concepts and Techniques 11

Unit j Net input Ij Output Oj 4 0.2 + 0 – 0.5 – 0.4 = -0.7 1/ (1+e0.7) =

0.332 5 -0.3 + 0 + 0.2 + 0.2 = 0.1 1/ (1+e0.1) =

0.525 6 (-0.3)(0.332)-(0.2)(0.525) +

0.1 = -0.105 1/ (1+e0.105) = 0.474

Initial input, weight, and bias values

The net input and output calculations

Unit j Errj 6 (0.474)(1-0.474)(1-0.474) = 0.1311 5 (0.525)(1-0.525)(0.1311)(-0.2) = -0.0065 4 (0.332)(1-0.332)(0.1311)(-0.3) = - 0.0087

Calculations of the error at each node.

November 24, 2014 Data Mining: Concepts and Techniques 12

Calculations for weight and bias updating

Weight or bias

New value

W46 -0.3 + (0.9)(0.1311)(0.332) = -0.261 W56 0.2 + (0.9)(0.1311)(0.525) = -0.138 W14 0.2 + (0.9)(-0.0087)(1) = 0.192 W15 -0.3 + (0.9)(-0.0065)(1) = -0.306 W24 0.4 + (0.9)(-0.0087)(0) = 0.4 W25 0.1 + (0.9)(-0.0065)(0) = 0.1 W34 -0.5 + (0.9)(-0.0087)(1) = -0.508 W35 0.2 + (0.9)(-0.0065)(1) = 0.194 Θ6 0.1 + (0.9)(0.1311) = 0.218 Θ5 0.2 + (0.9)(-0.0065) = 0.194 Θ4 -0.4 + (0.9)(-0.0087) = -0.408

Docsity is not optimized for the browser you're using. In order to have a better experience please switch to Google Chrome, Firefox, Internet Explorer 9+ or Safari! Download Google Chrome