Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Log in Sign up

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Assignment problems Solved - Advanced Logarithm Data Processing | CIS 4930, Assignments of Computer Science

University of Florida (UF)Computer Science

Material Type: Assignment; Class: ADV LG DATA PROCESSG; Subject: COMPUTER SCIENCE AND INFORMATION SYSTEMS; University: University of Florida; Term: Fall 2007;

Typology: Assignments

Pre 2010

Uploaded on 03/13/2009

koofers-user-n0e 🇺🇸

10 documents

1 / 7

This page cannot be seen from the preview

Don't miss anything!

Chapter 4, Problem 3:

a)

For the classification function, the number of + classifications are 4 and the number of – classifications are

5. Hence, the entropy of this collection with respect to the + class:

Entropy = – (4/9) log2 (4/9) = 0.52

b)

Now if we split on a1 and a2 then,

a1 N1 N2 a2 N1 N2

+ 3 1 + 2 2

- 1 4 - 2 3

Entropy(N1) = 0.81 Entropy(N1) = 1.00

Entropy(N2) = 0.72 Entropy(N2) = 0.97

gain = 0.99 – (4/9)(0.81) – (5/9)(0.72) Δgain = 0.99 – (4/9)(1.00) – (5/9)(0.97)Δ

= 0.23 = 0.01

c)

Hence, the best split for a3 is 2.0.

d)

Based on the calculations we made in part (b) and (c), the best split is a1.

e)

Classification error rate for the classification function

= 1 – max[4/9, 5/9]

= 1 – 5/9

= 0.44

a1 N1 N2 a2 N1 N2

+ 3 1 + 2 2

- 1 4 - 2 3

Error 0.25 0.20 Error 0.50 0.40

= 0.44 – (4/9)(0.25) – (5/9)(0.2) Δ = 0.44 – (4/9)(0.50) – (5/9)(0.40)Δ

= 0.22 = 0.00

So, based on classification error rate, a1 is the best split.

f)

For the classification function, GINI = 1 – (4/9)2 – (5/9)2 = 0.49

Discover Assignments of Computer Science University of Florida (UF)

Partial preview of the text

Download Assignment problems Solved - Advanced Logarithm Data Processing | CIS 4930 and more Assignments Computer Science in PDF only on Docsity!

Chapter 4, Problem 3: a) For the classification function, the number of + classifications are 4 and the number of – classifications are

Hence, the entropy of this collection with respect to the + class: Entropy = – (4/9) log2 (4/9) = 0. b) Now if we split on a1 and a2 then, a1 N1 N2 a2 N1 N
- 3 1 + 2 2 1 4 2 3 Entropy(N1) = 0.81 Entropy(N1) = 1. Entropy(N2) = 0.72 Entropy(N2) = 0. Δ gain = 0.99 – (4/9)(0.81) – (5/9)(0.72) Δgain = 0.99 – (4/9)(1.00) – (5/9)(0.97) = 0.23 = 0. c) Hence, the best split for a3 is 2.0. d) Based on the calculations we made in part (b) and (c), the best split is a1. e) Classification error rate for the classification function = 1 – max[4/9, 5/9] = 1 – 5/ = 0. a1 N1 N2 a2 N1 N
- 3 1 + 2 2 1 4 2 3 Error 0.25 0.20 Error 0.50 0. Δ = 0.44 – (4/9)(0.25) – (5/9)(0.2) Δ= 0.44 – (4/9)(0.50) – (5/9)(0.40) = 0.22 = 0. So, based on classification error rate, a1 is the best split. f) For the classification function, GINI = 1 – (4/9)^2 – (5/9)^2 = 0.

Now if we split on a1 and a2 then, a1 N1 N2 a2 N1 N

3 1 + 2 2 1 4 2 3 GINI 0.38 0.32 GINI 0.5 0. Δ = 0.49 – (4/9)(0.38) – (5/9)(0.32) Δ= 0.49 – (4/9)(0.5) – (5/9)(0.48) = 0.14 = 0. Hence, a1 is the best split. Chapter 4, Problem 7: a) For the classification function, Error (without any split) = 1 – max(50/100, 50/100) = 1 – ½ = ½ Attribute A: T F Class + 25 25 Class – 0 50 Error 1 – max(25/25, 0/25) 1 – max(25/75, 50/75) = 1 – 1 = 1 – 2/ = 0 = 1/ Δ = ½ – [25/100 • 0 + 75/100 • 1/3] = ½ – ¼ = 0. Attribute B: T F Class + 30 20 Class – 20 30 Error 1 – max(30/50, 20/50) 1 – max(20/50, 30/50) = 1 – 3/5 = 1 – 3/ = 2/5 = 2/ Δ = ½ – [50/100 • 2/5 + 50/100 • 2/5] = ½ – 2/5 = 1/10 = 0. Attribute C: T F Class + 25 25 Class – 25 25 Error 1 – max(25/50, 25/50) 1 – max(25/50, 25/50) = 1 – ½ = 1 – ½ = ½ = ½ Δ = ½ – [50/100 • ½ + 50/100 • ½] = ½ – ½ = 0 Clearly, attribute A is the best first split since it has the best gain. Hence, the tree would look like A T F 25 + +? 25 + 0 – 50 –

result in the this tree. Lets start with finding the second split for the left child (C = true). Also note that the classification error for this node is ½ as computed in part (a). Attribute A: T F Class + 25 0 Class – 0 25 Error 1 – max(25/25, 0/25) 1 – max(0/25, 25/25) = 1 – 1 = 1 – 1 = 0 = 0 Δ = ½ – [25/50 • 0 + 25/50 • 0] = ½ – 0 = 0. Attribute B: T F Class + 5 20 Class – 20 5 Error 1 – max(5/25, 20/25) 1 – max(20/25, 5/25) = 1 – 4/5 = 1 – 4/ = 1/5 = 1/ Δ = ½ – [25/50 • 1/5 + 25/50 • 1/5] = ½ – 1/5 = 3/10 = 0. Clearly, attribute A is better split for the left subtree. Now lets find the second splitting attribute for the right child of the root (C = false). The classification error for this is ½ as computed in part (a). Attribute A: T F Class + 0 25 Class – 0 25 Error 1 – max(25/50, 25/50) = 1 – ½ = ½ Δ = ½ – 50/50 • ½ = ½ – ½ = 0 Attribute B: T F Class + 25 0 Class – 0 25 Error 1 – max(25/25, 0/25) 1 – max(0/25, 25/25) = 1 – 1 = 1 – 1 = 0 = 0 Δ = ½ – [25/50 • 0 + 25/50 • 0] = ½ – 0 = 0. C T F A B T F T F 25 + + 0 + – 25 + + – 0 + 0 – 25 – 0 – 25 – Hence, attribute B is a better split for the right subtree. Since all the leaf nodes are pure classes, 0 out of

the 100 instances will get misclassified. This tree will correctly classify all instances. e) It is clearly evident from the results of part (c) and (d), that the greedy approach does not always lead to a decision tree with lowest misclassification errors. Chapter 4, Problem 9: Tree (a) Number of non leaf nodes = 2 Number of leaf nodes = 3 Number of errors = 7 Number of classes = 3 Number of attributes = 16 Number of records = n Hence, Cost(tree) = 2 log2 16 + 3 log2 3 = 24 + 31.585 = 8 + 4.755 = 12.755 bits Cost(data | tree) = 7 log2 n = 7 log2 n bits Cost(tree, data) = 12.755 + 7 log2 n bits Tree (b) Number of non leaf nodes = 4 Number of leaf nodes = 5 Number of errors = 4 Number of classes = 3 Number of attributes = 16 Number of records = n Hence, Cost(tree) = 4 log2 16 + 5 log2 3 = 44 + 51.585 = 16 + 7.925 = 23.925 bits Cost(data | tree) = 4 log2 n = 4 log2 n bits Cost(tree, data) = 23.925 + 4 log2 n bits Solving for n in 23.925 + 4 log2 n = 12.755 + 7 log2 n , we get, n = 13.208. Hence, according to MDL principle, decision tree (b) is better if n ≥ 14 and tree (a) is better otherwise. Chapter 5, Problem 4: a) Accuracy(R1) = 4 / (4 + 1) = 4/5 = 0. Accuracy(R2) = 30 / (30 + 10) = ¾ = 0. Accuracy(R3) = 100 / (100 + 90) = 10/19 = 0.5263 R1 best, R3 worst b) For FOIL's information gain, we will extend a rule with equal positive and negative coverage with the given rules, and then compare with the results i.e. p0 = n FOIL(R1) = 4 • (log2(4/5) – log2(1/2)) = 4 • (–0.3219 + 1) = 2. FOIL(R2) = 30 • (log2(30/40) – log2(1/2)) = 30 • (–0.4150 + 1) = 17. FOIL(R3) = 100 • (log2(100/190) – log2(1/2)) = 100 • (–0.9260 + 1) = 7.4001 R2 best, R1 worst c) For R1, k = 2, f+ = 4, e+ = 5 • 100/500 = 1, f– = 1, e– = 5 • 400/500 = 4 LSR(R1) = 2 • (4 • log2(4/1) + 1 • log2(1/4)) = 2 • (4 • 2 + 1 • (–2)) = 2 • 6 = 12

c) P(A=1) = 5/10 = 0.5 P(B=1) = 4/10 = 0. P(A=1) • P(B=1) = 0.5 • 0.4 = 0. P(A=1, B=1) = P(B=1 | A=1) • P(A=1) = 2/5 • 5/10 = 2/10 = 0. Since P(A=1, B=1) = P(A=1) • P(B=1), we can say that the random variables A and B are independent. d) P(A=1) = 5/10 = 0.5 P(B=0) = 6/10 = 0. P(A=1) • P(B=0) = 0.5 • 0.6 = 0. P(A=1, B=0) = P(B=0 | A=1) • P(A=1) = 3/5 • 5/10 = 3/10 = 0. Since P(A=1, B=0) = P(A=1) • P(B=0), we can say that the random variables A and B are independent. e) P(A=1, B=1 | +) = P(A=1, B=1, class=+) / P(+) = (1/10) / (5/10) = 1/5 = 0. Using values from part (a), P(A=1 | + ) • P(B=1 | +) = (3/5) • (2/5) = 6/25 = 0. Since P(A=1, B=1 | +) ≠ P(A=1 | + ) • P(B=1 | +), we can say that the random variable A and B are not conditionally independent on the class '+'. Chapter 5, Problem 12: a) P(B=g, F=e, G=e, S=y) = P(S=y | B=g, F=e, G=e) • P(B=g, F=e, G=e) = P(S=y | B=g, F=e) • P(G=e | B=g, F=e) • P(B=g, F=e) S does not depend on G = (1 – P(S=n | B=g, F=e)) • P(G=e | B=g, F=e) • P(B=g) • P(F=e) B and F are independent = (1 – P(S=n | B=g, F=e)) • P(G=e | B=g, F=e) • (1 – P(B=b)) • P(F=e) = (1 – 0.8) • 0.8 • (1 – 0.1) • 0. = 0.2 • 0.8 • 0.9 • 0.2 = 0. b) P(B=b, F=e, G=ne, S=n) = P(S=n | B=b, F=e, G=ne) • P(B=b, F=e, G=ne) = P(S=n | B=b, F=e) • P(G=ne| B=b, F=e) • P(B=b, F=e) S does not depend on G = P(S=n | B=b, F=e) • (1 – P(G=e | B=b, F=e)) • P(B=b) • P(F=e) B and F are independent = 1 • (1 – 0.9) • 0.1 • 0. = 1 • 0.1 • 0.1 • 0.2 = 0. c) P(S=y | B=b) = 1 – P(S=n | B=b) = 1 – P(S=n, B=b) / P(B=b) = 1 – 0.92 = 0. P(S=n, B=b) / P(B=b) = (P(S=n,B=b,F=e) + P(S=n, B=b, F=ne)) / P(B=b) = (P(S=n | B=b, F=e) • P(B=b) • P(F=e) + P(S=n |B=b, F=ne) • P(B=b) • P(F=ne)) / P(B=b) = P(S=n | B=b, F=e) • P(F=e) + P(S=n |B=b, F=ne) • P(F=ne) = 1 • 0.2 + 0.9 • (1 – 0.8) = 0.2 + 0.72 = 0.

Assignment problems Solved - Advanced Logarithm Data Processing | CIS 4930, Assignments of Computer Science

Related documents

Partial preview of the text

Download Assignment problems Solved - Advanced Logarithm Data Processing | CIS 4930 and more Assignments Computer Science in PDF only on Docsity!