




Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Main points of this exam paper are: Overfitting, Information Gain, Algorithm Addressed, Specific Search Strategy, Training Set, Iterative Deepening, Cross-Over Operator
Typology: Exams
1 / 8
This page cannot be seen from the preview
Don't miss anything!





Midterm Exam: 7:15-9:15 pm, October 26, 2011 Room B371 Chemistry
CLOSED BOOK (one sheet of notes and a calculator allowed)
Write your answers on these pages and show your work. If you feel that a question is not fully specified, state any assumptions you need to make in order to solve the problem. You may use the backs of these sheets for scratch work.
Write your name on this and all other pages of this exam. Make sure your exam contains five problems on eight pages.
Name ________________________________________________________________
Student ID ________________________________________________________________
Problem Score Max Score
1 ______ 25
2 ______ 25
3 ______ 13
4 ______ 25
5 ______ 12
TOTAL ______ 100
Assume that you are given the set of labeled training examples below, where each of three features has three possible values: a, b, or c. You choose to learn a decision tree from this data.
F1 F2 F3 Output ex1 c b b + ex2 a a c + ex3 b c c + ex4 b c a - ex5 a b c - ex6 c a b -
a) What score would the information gain calculation assign to feature F1 , when deciding which feature to use as the root node for a decision tree being built? Be sure to show all your work (on this and all other questions).
b) Consider the following new candidate test for a node in the decision tree. What would be its information gain on the above data set when considered as the root node?
Does F2 = F3?
a) Consider the search space below, where S is the start node and G1 and G2 satisfy the goal test. Arcs are labeled with the cost of traversing them and the estimated cost to a goal is reported inside nodes (so lower scores are better).
For each of the following search strategies, indicate which goal state is reached (if any) and list, in order , all the states popped off of the OPEN list. When all else is equal, nodes should be removed from OPEN in alphabetical order.
Best-First (using f = h)
Goal state reached: _______ States popped off OPEN: ____________________________________
Iterative Deepening
Goal state reached: _______ States popped off OPEN: ____________________________________
Goal state reached: _______ States popped off OPEN: ____________________________________
G 0 G 0
D (^14) G1 088^ B^1 G^3^0^ C^8^ E 5
5
7 4
1 7
4
(^5) G1 088 B (^1) G 3 0 C (^88)
2
3
3
B 3
3
3
8
9
F 3
C 2
J 1
2
S 7 A 9
b) Using the same search space as in Part a , consider using Simulated Annealing as your search strategy. Assume the current temperature is 6.
If you are at Node D and simulated annealing has randomly selected node S for consideration, what is the probability that moving to this node is accepted?
c) Now imagine that you wish to run a Genetic Algorithm on Part a ’s search space. You use four bits to represent nodes: A = 0001 , B = 0010 , C =0 011 , D = 0100 , E = 0101, F =0 110, G1 =1 001, G2 = 1110, J = 0111, and S = 0000.
i. If nodes B and E are chosen for the cross-over operator, show two possible children that can be produced.
ii. With 4 bits one can represent 16 distinct nodes, but only 10 are in this task’s search problem. What do you think should be done when a bit string is generated that matches none of the nodes?
What is one strength of best-first search compared to uniform-cost search? One weakness? Briefly explain your answers.
a) ONE Strength (of best-first search compared to uniform-cost search)
b) ONE Weakness
c) Briefly, why do we need both a tuning set and a testing set in machine learning?
Assume we are given this joint probability distribution involving random events A and B :
A B Prob F F 0.2 d) What is P(A)? ______________________ F T 0. T F 0.1 e) What is P(B | A)? ______________________ T T 0.
We are told P(A) = 0.4 and P(B) = 0.7.
f) If A and B are independent, what is P(A and B)? _____________________________
Now assume we do not know anything about the independence of A and B.
g) What is the largest possible value for P(A and B)? _______________________ Explanation (hint: think about Venn Diagrams):
h) What is the smallest possible value for P(A and B)? _______________________ Explanation (hint: again, think about Venn Diagrams):
Briefly describe each of the following AI concepts and explain each’s significance.
Admissibility
Description:
Significance:
Horizon Effect
Description:
Significance:
Temperature(in the Simulated Annealing Algorithm)
Description:
Significance: