Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

KNN, ID Trees, and Neural Nets-Artificial Intelligence-Tutorial Handout, Exercises of Artificial Intelligence

Central University of Jammu and Kashmir Artificial Intelligence

Madam Amrita Ahuja distributed this handout in class of Artificial Intelligence course at Central University of Jammu and Kashmir. This handout explains important concepts including: Knn, Id, Trees, Neural, Nets, Decision, Boundaries, Distance, Euclidean, Distance, Functions, Dimensions

Typology: Exercises

2011/2012

Uploaded on 07/31/2012

shaina_44kin 🇮🇳

3.9

(9)

64 documents

1 / 13

This page cannot be seen from the preview

Don't miss anything!

KNN-ID and Neural Nets

KNN, ID Trees, and Neural Nets

Intro to Learning Algorithms

KNN, Decision trees, Neural Nets are all supervised learning algorithms

Their general goal = make accurate predictions about unknown data after being trained on known

data.

Data comes in form of examples with the general form:

x1, .. xn are also known as features, inputs or dimensions y is the output or class label.

Both xi and ys can be discrete (taking on specific values) {0, 1}

or continuous (taking on a range of values) [0, 1]

In training we are given (x1, ... xn, y) tuples. In testing (classification), we are given only (x1,...

xn) and the goal is to predict y with high accuracy.

Training error is the classification error measured using training data to test.

Testing error is classification error on data not seen in the training phase.

K Nearest Neighbors

1-NN

● Given an unknown point, pick the closest 1 neighbor by some distance measure.

● Class of unknown is the 1-nearest neighbor's label.

k-NN

● Given an unknown, pick the k closest neighbors by some distance function.

● Class of unknown is the mode of the k-nearest neighbor's labels.

● k is usually an odd number to facilitate tie breaking.

How to draw 1-NN decision boundaries

Decision boundaries, lines on which it is equally likely to be in any of the classes.

1. Examine the region where you think decision boundaries should occur.

2. Find oppositely labeled points (+/-)

3. Draw bisectors. (use pencil)

4. Extend and join all bisectors. Erase extraneously extended lines.

5. Remember to draw boundaries to the edge of the graph and indicate it with arrows! (a very

common mistake).

6. Your 1-NN boundaries generally should have sharp edges and corners (otherwise, you are doing

something wrong or drawing boundaries for a higher k-nn.)

Distance Functions

How to determine what points are "nearest". Here are some standard Distance functions:

Euclidean Distance

Manhattan Distance (Block distance)

- Sum of distances in each dimension

Hamming Distance

- Sum of differences in each dimension

I(x,y) = 0 if identical, 1 if different.

docsity.com

Discover Exercises of Artificial Intelligence Central University of Jammu and Kashmir

Partial preview of the text

Download KNN, ID Trees, and Neural Nets-Artificial Intelligence-Tutorial Handout and more Exercises Artificial Intelligence in PDF only on Docsity!

KNN, ID Trees, and Neural Nets

Intro to Learning Algorithms

KNN, Decision trees, Neural Nets are all supervised learning algorithms Their general goal = make accurate predictions about unknown data after being trained on known data.

Data comes in form of examples with the general form:

x 1 , .. x (^) n are also known as features, inputs or dimensions y is the output or class label.

Both xi and ys can be discrete (taking on specific values) {0, 1}

or continuous (taking on a range of values) [0, 1]

In training we are given (x 1 , ... x (^) n , y) tuples. In testing (classification), we are given only (x 1 ,...

xn ) and the goal is to predict y with high accuracy.

Training error is the classification error measured using training data to test. Testing error is classification error on data not seen in the training phase.

K Nearest Neighbors

1-NN

● Given an unknown point, pick the closest 1 neighbor by some distance measure. ● Class of unknown is the 1-nearest neighbor's label. k-NN ● Given an unknown, pick the k closest neighbors by some distance function. ● Class of unknown is the mode of the k-nearest neighbor's labels. ● k is usually an odd number to facilitate tie breaking.

How to draw 1-NN decision boundaries

Decision boundaries, lines on which it is equally likely to be in any of the classes.

Examine the region where you think decision boundaries should occur.
Find oppositely labeled points (+/-)
Draw bisectors. (use pencil)
Extend and join all bisectors. Erase extraneously extended lines.
Remember to draw boundaries to the edge of the graph and indicate it with arrows! (a very common mistake).
Your 1-NN boundaries generally should have sharp edges and corners (otherwise, you are doing something wrong or drawing boundaries for a higher k-nn.)

Distance Functions How to determine what points are "nearest". Here are some standard Distance functions:

Euclidean Distance

Manhattan Distance (Block distance)

Sum of distances in each dimension

Hamming Distance

Sum of differences in each dimension I(x,y) = 0 if identical, 1 if different.

Cosine Similarity

Used in Text classification; words are dimensions; documents are vectors of words; vector component is 1 if word i exist.

(Optional) How to Weigh Dimensions Differently

In Euclidean distance all dimensions are treated the same. But in practice not dimensions are equally important or useful!

For example. Suppose we represent documents as vectors of words. Consider the task of classifying documents related to "Red Sox". If all words are equal, then the word "the" weighs the same as the word "Sox". But almost every english document has the word "the". But only sports related documents have the word "Sox". So we want k-nn distance metrics to weight meaningful words like sox more than functional words like "the".

For text classification, a weight scheme used to make some dimensions (words) more important than others is known as: TF-IDF

Here: tf: Words that occur frequently should be weighed more. idf: Words that occur in all the documents (functional-words like the, of etc) should be weighed less. Using this weighing scheme with a distance metric, knn would produce better (more relevant) classifications.

Another way to vary the importance of different dimensions is to use: Mahalanobis Distance

Here S is a covariance matrix. Dimensions that show more variance are weighted more.

H

/3 to /9 /10 to /
- 1 3 0.33 0. numerator denominator fraction (fraction)
  - 1 10 0.10 0. numerator denominator fraction (fraction)
- 2 3 0.67 0.92 2 10 0.20 0.
- 1 4 0.25 0.81 3 10 0.30 0.
- 2 4 0.50 1.00 4 10 0.40 0.
- 1 5 0.20 0.72 1 11 0.09 0.
- 2 5 0.40 0.97 2 11 0.18 0.
- 3 5 0.60 0.97 3 11 0.27 0.
- 1 6 0.17 0.65 4 11 0.36 0.
- 2 6 0.33 0.92 5 11 0.45 0.
- 3 6 0.50 1.00 1 12 0.08 0.
- 1 7 0.14 0.59 2 12 0.17 0.
- 2 7 0.29 0.86 3 12 0.25 0.
- 3 7 0.43 0.99 5 12 0.42 0.
- 1 8 0.13 0.54 1 13 0.08 0.
- 2 8 0.25 0.81 2 13 0.15 0.
- 3 8 0.38 0.95 3 13 0.23 0.
- 4 8 0.50 1.00 4 13 0.31 0.
- 1 9 0.11 0.50 5 13 0.38 0.
- 2 9 0.22 0.76 6 13 0.46 1.
- 3 9 0.33 0.
- 4 9 0.44 0.

Neural Networks:

First, Read Professor Winston's Notes on NNs!

General Back Propagation Algorithm:

function train (examples)

Initialize weights
While true:
1. foreach (inputs, outputs) = example in examples
  1. Run backward-propagation(inputs, outputs)
2. If termination conditions met then quit

Possible Termination conditions

When the error rate is 0
When performance function P over the output and desired output is within an epsilon of 0.
When the weight change from the last iteration is within an epsilon of 0.

function backward-propagation (inputs, outputs)

Fix inputs (x_i .. x_n) to input values, fix desired d to outputs
Forward Propagation step (Working forwards from input to outputs) m for n from layer 1 to L ■ compute ■ i = inputs to the weights [ xi when at the first layer, oi (output of previous layer) at other layers. ]

■ compute

Backward Propagation step m for l from layer L to 1 (Working backwards from output layer to input layers) ■ for each neuron n in layer l

compute

for the standard sigmoid unit, and performance function If n is in the Lth^ (last) layer:

Else:

Note w (^) nj are links that come out of n into j nodes

compute compute new weights

function test (inputs) => outputs Run the forward propagation step, return o (^) n in the last layer as outputs.

A B T W^ A W^ B W^ T z o d d-o

forward 0 0 -1 0 0 1 a) -1 b) 0.27 1 (1-0.27) =0.

backward

forward 0 1 -

c) 0

d) 0

e) 0.

0.86 f) -0.86 g) 0.30 1

backward

forward 1 0 -

h) 0

i) 0.

j) 0.

0.71 k) -0.71 l) 0.33 0

Simulating The Steps of Back Propagation

This is a detailed Step-by-step answer to the first 2 steps of the Fall 2009 Quiz 3 Neural Nets part B.

The given network has the following architecture:

Your task is to fill in the following table (non-shaded). Detailed calculations a) - l) for the fill-in boxes are included below. In forward steps, your goal is to compute z, o, d-o In backward steps, your goal is to compute the weight updates δ and ΔWs and find the new Ws.

a) z = AW (^) A + BWB + TWT = 00+00+-11 = -

b) o = sigmoid(z) = sigmoid(-1) = 1/(1+e^-(-1)) = 0. c) δ = (d - o)(o(1-o)) # Because it's the last (and only layer). = (1 - 0.27) * (0.27*(1-0.27)) = 0. ΔW (^) A = alpha * δ * A = 1 * 0.14 * 0 = 0 WA = 0 + ΔWA = 0 + 0 = 0

d) ΔWB = alpha * δ * B

W (^) B = 0 + ΔWB = 0 + 0 = 0

e) ΔW (^) T = alpha * δ * T = 1 * 0.14 * -1 = -0.

W (^) T = 1 + ΔWT = 1 - 0.14 = 0.

f) z = AW (^) A + BWB + TWT = 00+10+-10.86 = -0.

g) o = sigmoid(-0.86) = 0. h) δ = (d - o)(o(1-o)) = (1 - 0.3) * (0.3*(1-0.3)) = 0. ΔW (^) A = alpha * δ * A = 1 * 0.15 * 0 = 0 WA = 0 + ΔWA = 0

i) ΔWB = 1 * 0.15 * 1 = 0.

W (^) B = 0 + ΔWB = 0.

j) ΔW (^) T = alpha * δ * T = 1 * 0.15 * -1 = -0.

W (^) T = 0.86 + ΔWT = 0.86 + -0.15 = 0.

k) z = AW (^) A + BWB + TWT = 10+00.15+-10.71 = -0.

l) o = sigmoid(-0.71) = 0.

Deriving the Back-Propagation Weight Update Equations

The general equation for weight updates in back propagation: (For a weight with input i and going into a node n.)

is the learning rate (also denoted by r). Lower the learning rate, longer it takes to converge. But if the learning rate is too high we may never find the maximum (we keep oscillating.!)

To find the partial, we employ the chain rule:

Here:

Where i is an input into neuron N.

Hence the third term is always the input.

is used as the short hand for the first 2 partials.

For the very last layer:

For the standard Performance function (for Gradient Ascent - Maximization)

Partial on the Sigmoid function:

Hence for the last layer:

For when layer is not the last layer. Suppose m is the layer above n.

How to solve Network Weights - An XOR example:

Solve for the 9 unknown weights for the network that will perfectly classify the concept in the picture.

Step 1. First, think of input-level units (units A, and B) as defining regions (that divide +s from -s) in the X, Y graph. These regions should be depicted as linear boundary lines with arrows pointing towards the +ve data points. Next, think of hidden level neural units (unit C) as some logical operator (a linearly separable operator) that combines those regions defined by the input level units.

So in this case: units A, and B represent the diagonal boundaries (with arrows) on the graph (definition two distinct ways of separating the space). Unit C represents a logical AND that intersects the two regions to create the bounded region in the middle.

Step 2. Write the line equations for the regions you defined in the graph. A) The boundary equation for the region define by line A

Y < -1 x + 3/

B) The boundary equation for the region defined by line B

Y > -1 x + 1/

Step 3. Rewrite the line equations into the form: aX + bY > c Where a, b, and c are integers. A) Y < -1 x + 3/ X + Y < 3/ -2X + -2Y > 3

B) Y > -1 x + 1/ X + Y > 1/ 2X + 2Y > 1

Step 4. Note down the sum-of-weights-and-inputs (z) for each neural unit can also be written in this form. For Unit A: z = W (^) XA X + WYA Y + WA(-1) > 0 W (^) XA X + WYA Y > WA

For Unit B: z = W (^) XB X + WYB Y + WB (-1) > 0 WXB X + WYB Y > WB

A B desired output Equations Simplified

0 0 0 - W^ C < 0^ WC > 0

0 1 0 WBC - WC < 0^ W^ BC < WC

1 0 0 WAC - WC < 0^ WAC < WC

W AC + WBC - WC

WAC + WBC > WC

Why WXA X + WYA Y + WA(-1) > 0 vs. < 0? When z = WXA X + WYA Y + WA(-1) > 0 sigmoid(z>0) approaches 1 +ve points When z = WXA X + WYA Y + WA(-1) < 0 sigmoid(z<0) approaches 0 -ve points

The when expressed as > 0 the region is towards the +ve points When expressed as < 0 the region defined to is pointing towards -ve points.

Step 5. Easy! Just read off the weights by correspondence. (Note: In the 2006 Quiz, the L and Pinapple problem wants you to match the correct equation by constraining the value of some weights.)

-2 X + -2 Y > 3 line A's inequality WXA X + WYA Y > WA z equation for unit A.

WXA = -2 WYA = -2 WA = 3

2 X + 2 Y > 1 line B's inequality W (^) XB X + WYB Y > WB (^) z equation for unit B

WXB = 2 WYB = 2 WB = 1

Step 6. Solve the logic in the second Layer We want to compute (A AND B) So build a Truth table! and solve for the constraints!

We notice a symmetry in W (^) BC and WAC, so we make a guess that they have the same value.

W (^) BC = 2 and WAC = 2

Then equalities in the table above condense down to:

W (^) C > 0 WC > 2 (twice) WC < 2+2 = 4

So 2 < Wc < 4 Then WC = 3 will work. An acceptable solution:

WBC = 2 WAC = 2 WC = 3

The following solution also works, because it also obeys the stated constraints.

WBC = 109 WAC = 109 WC = 110

But quizzes will ask for smallest integer solutions.

Try different models. Pick the model that gives you the lowest CV error.

Examples:

KNN - vary k from 1 to N-1. Run cross-validation under different ks. Choose the k with the lowest CV error.

Decision Tree - Try trees of varying depth, and varying number of tests. Run cross-validation to pick the tree of lowest CV Error.

Neural Net - Try different Neural Net architectures. Run CV to pick the architecture of lowest CV Error.

Models chosen using cross-validation can generalize better, and are less likely to overfit or underfit.

KNN, ID Trees, and Neural Nets-Artificial Intelligence-Tutorial Handout, Exercises of Artificial Intelligence

Related documents

Partial preview of the text

Download KNN, ID Trees, and Neural Nets-Artificial Intelligence-Tutorial Handout and more Exercises Artificial Intelligence in PDF only on Docsity!

KNN, ID Trees, and Neural Nets

Intro to Learning Algorithms

K Nearest Neighbors

1-NN

How to draw 1-NN decision boundaries

(Optional) How to Weigh Dimensions Differently

H

H

Neural Networks:

Simulating The Steps of Back Propagation

Deriving the Back-Propagation Weight Update Equations

How to solve Network Weights - An XOR example:

W AC + WBC - WC

WAC + WBC > WC

WXA = -2 WYA = -2 WA = 3

WXB = 2 WYB = 2 WB = 1