##### Document information

November 24, 2014 Data Mining: Concepts and Techniques 1

Classification by Backpropagation

• **Backpropagation:** A **neural network **learning algorithm
• Started by psychologists and neurobiologists to develop and

test computational analogues of neurons • A neural network: A set of connected input/output units

where each connection has a **weight** associated with it
• During the learning phase, the **network learns by
**

**adjusting the weights** so as to be able to predict the
correct class label of the input tuples

• Also referred to as **connectionist learning** due to the
connections between units

November 24, 2014 Data Mining: Concepts and Techniques 2

Neural Network as a Classifier

• **Weakness
**– Long training time
– Require a number of parameters typically best determined

empirically, e.g., the network topology or ``structure." – Poor interpretability: Difficult to interpret the symbolic

meaning behind the learned weights and of ``hidden units" in the network

• **Strength
**– High tolerance to noisy data
– Ability to classify untrained patterns
– Well-suited for continuous-valued inputs and outputs
– Successful on a wide array of real-world data
– Algorithms are inherently parallel
– Techniques have recently been developed for the extraction of

rules from trained neural networks

November 24, 2014 Data Mining: Concepts and Techniques 3

A Neuron (= a perceptron)

• The *n*-dimensional input vector **x** is mapped into variable y by
means of the scalar product and a nonlinear function mapping

*k*-

*f
*

**weighted
sum
**

**Input
vector x
**

**output y
**

**Activation
function
**

**weight
vector w
**

*w0
*

*w1
*

*wn
*

*x0
*

*x1
*

*xn
*

)sign(y

ExampleFor n

0i
*kiixw *

November 24, 2014 Data Mining: Concepts and Techniques 4

A Multi-Layer Feed-Forward Neural Network

**Output layer
**

**Input layer
**

**Hidden layer
**

**Output vector
**

**Input vector: X
**

*wij
*

*i
*

*jiijj OwI *

*jIj e
O *

1

1

))(1( *jjjjj OTOOErr *

*jk
k
*

*kjjj wErrOOErr * )1(

*ijijij OErrlww *)(
*jjj Errl*)(

November 24, 2014 Data Mining: Concepts and Techniques 5

*i
*

*jiijj OwI *

*jIj e
O *

1

1

))(1( *jjjjj OTOOErr *

Given a unit j in a hidden or uotput layer, the net input Ij, to unit j is

Where **wij** is the weight of the connection from unit **i** in the previous layer to
unit **j**; **Oi **is the output of unit **i** from the previous layer ; and **θj** is the bias of
the unit which acts as a threshold in that it serves to vary the activity of the
unit.Given the net input Ij to unit j , then Oj , the output of unit j, is computed as

For a unit j in the output layer, the error Errj is computed by Where Oj is the actual output of unit j, and Tj is the known target value of the given training tuple

*jk
k
*

*kjjj wErrOOErr * )1(
To compute the error of a hidden layer unit j, the

weighted sum of the errors of the units connected to unit j in the next layer are considered. The error

of a hidden layer unit j is Where wjk is the weight of the connection from unit j

to a unit k in the next higher layer, and Errk is the error of unit k.

*ijijij OErrlww *)(

*jjj Errl*)(

Weights are updated as follows Where l is the learning rate.

Biases are updated as follows. Where l is the learning.

November 24, 2014 Data Mining: Concepts and Techniques 6

How A Multi-Layer Neural Network Works?

• The **inputs** to the network correspond to the attributes measured for
each training tuple

• Inputs are fed simultaneously into the units making up the **input
layer
**

• They are then weighted and fed simultaneously to a **hidden layer
**• The number of hidden layers is arbitrary, although usually only one
• The weighted outputs of the last hidden layer are input to units

making up the **output layer**, which emits the network's prediction
• The network is **feed-forward** in that none of the weights cycles back

to an input unit or to an output unit of a previous layer
• From a statistical point of view, networks perform **nonlinear
**

**regression**: Given enough hidden units and enough training samples,
they can closely approximate any function

November 24, 2014 Data Mining: Concepts and Techniques 7

Defining a Network Topology
• First decide the **network topology: **# of units in the

*input layer*, # of *hidden layers* (if > 1), # of units in
*each hidden layer*, and # of units in the *output layer
*

• Normalizing the input values for each attribute measured in the training tuples to [0.0—1.0]

• One **input** unit per domain value, each initialized to 0
• **Output, if for classification and more than two
**

classes, one output unit per class is used • Once a network has been trained and its accuracy is

**unacceptable**, repeat the training process with a
*different network topology or a different set of initial
weights*

November 24, 2014 Data Mining: Concepts and Techniques 8

Backpropagation • Iteratively process a set of training tuples & compare the

network's prediction with the actual known target value • For each training tuple, the weights are modified to minimize

**the mean squared error** between the network's prediction and
the actual target value

• Modifications are made in the “backwards” direction: from the
output layer, through each hidden layer down to the first hidden
layer, hence “**backpropagation**”

• **Steps
**– Initialize weights (to small random #s) and biases in the

network – Propagate the inputs forward (by applying activation function) – Backpropagate the error (by updating weights and biases) – Terminating condition (when error is very small, etc.)

November 24, 2014 Data Mining: Concepts and Techniques 9

Backpropagation and Interpretability

• **Efficiency of backpropagation: Each epoch (one iteration
**through the training set) takes O(|D| * *w*), with |D| tuples and
*w* weights, but # of epochs can be exponential to n, the
number of inputs, in the worst case

• **Rule extraction from networks:** network pruning
– Simplify the network structure by removing weighted links

that have the least effect on the trained network – Then perform link, unit, or activation value clustering – The set of input and activation values are studied to derive

rules describing the relationship between the input and hidden unit layers

• **Sensitivity analysis:** assess the impact that a given input
variable has on a network output. The knowledge gained
from this analysis can be represented in rules

November 24, 2014 Data Mining: Concepts and Techniques 10

Sample Calculation for learning by the backpropagation Algorithm

• Fig Shows A multilayer feed-forward Neural Network. Let
the learning rate be 0.9. The initial weight and bias values
of the N/W are given in Table 1.1 along with the first
training tuple, **X= ( 1, 0 ,1)** whose class label is 1.

1

2

4

3

6

5

**x1
**

**w24
**

**W15
**

**x2
**

**x3
**

**w14
**

**w56
**

**w46
**

**w35
**

**w25
**

**w34**

November 24, 2014 Data Mining: Concepts and Techniques 11

**Unit j Net input Ij Output Oj
4 0.2 + 0 – 0.5 – 0.4 = -0.7 1/ (1+e0.7) =
**

**0.332
5 -0.3 + 0 + 0.2 + 0.2 = 0.1 1/ (1+e0.1) =
**

**0.525
6 (-0.3)(0.332)-(0.2)(0.525) +
**

**0.1 = -0.105
1/ (1+e0.105) =
0.474
**

**Initial input, weight, and bias values
**

**The net input and output calculations
**

**Unit j Errj
6 (0.474)(1-0.474)(1-0.474) = 0.1311
5 (0.525)(1-0.525)(0.1311)(-0.2) = -0.0065
4 (0.332)(1-0.332)(0.1311)(-0.3) = - 0.0087
**

**Calculations of the error at each node**.

November 24, 2014 Data Mining: Concepts and Techniques 12

Calculations for weight and bias updating

**Weight or
bias
**

**New value
**

**W46 -0.3 + (0.9)(0.1311)(0.332) = -0.261
W56 0.2 + (0.9)(0.1311)(0.525) = -0.138
W14 0.2 + (0.9)(-0.0087)(1) = 0.192
W15 -0.3 + (0.9)(-0.0065)(1) = -0.306
W24 0.4 + (0.9)(-0.0087)(0) = 0.4
W25 0.1 + (0.9)(-0.0065)(0) = 0.1
W34 -0.5 + (0.9)(-0.0087)(1) = -0.508
W35 0.2 + (0.9)(-0.0065)(1) = 0.194
Θ6 0.1 + (0.9)(0.1311) = 0.218
Θ5 0.2 + (0.9)(-0.0065) = 0.194
Θ4 -0.4 + (0.9)(-0.0087) = -0.408**