






Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Artificial Intelligence. Midterm Exam. INSTRUCTIONS. • You have 3 hours. • The exam is closed book, closed notes except a two-page crib sheet.
Typology: Schemes and Mind Maps
1 / 12
This page cannot be seen from the preview
Don't miss anything!







First name
Last name
Login
For staff use only:
Each of the trees (G1 through G5) was generated by searching the graph (below, left) with a graph search algorithm. Assume children of a node are visited in alphabetical order. Each tree shows only the nodes that have been expanded. Numbers next to nodes indicate the relevant “score” used by the algorithm’s priority queue. The start state is A, and the goal state is G.
For each tree, indicate:
Please fill in your answers on the next page.
In the following problems please choose all the answers that apply, if any. You may circle more than one answer. You may also circle no answers (none of the above)
(a) [2 pts] Consider two consistent heuristics, H 1 and H 2 , in an A∗^ search seeking to minimize path costs in a graph. Assume ties don’t occur in the priority queue. If H 1 (s) ≤ H 2 (s) for all s, then
(i) A∗^ search using H 1 will find a lower cost path than A∗^ search using H 2. (ii) A∗^ search using H 2 will find a lower cost path than A∗^ search using H 1. (iii) A∗^ search using H 1 will not expand more nodes than A∗^ search using H 2. (iv) A∗^ search using H 2 will not expand more nodes than A∗^ search using H 1.
(b) [2 pts] Alpha-beta pruning:
(i) May not find the minimax optimal strategy. (ii) Prunes the same number of subtrees independent of the order in which successor states are expanded. (iii) Generally requires more run-time than minimax on the same game tree.
(c) [2 pts] Value iteration:
(i) Is a model-free method for finding optimal policies. (ii) Is sensitive to local optima. (iii) Is tedious to do by hand. (iv) Is guaranteed to converge when the discount factor satisfies 0 < γ < 1.
(d) [2 pts] Bayes nets:
(i) Have an associated directed, acyclic graph. (ii) Encode conditional independence assertions among random variables. (iii) Generally require less storage than the full joint distribution. (iv) Make the assumption that all parents of a single child are independent given the child.
(e) [2 pts] True or false? If a heuristic is admissible, it is also consistent.
(f ) [2 pts] If we use an -greedy exploration policy with Q-learning, the estimates Qt are guaranteed to converge to Q∗^ only if:
(i) goes to zero as t goes to infinity, or (ii) the learning rate α goes to zero as t goes to infinity, or (iii) both α and go to zero.
(g) [2 pts] True or false? Suppose X and Y are correlated random variables. Then
P (X = x, Y = y) = P (X = x)P (Y = y|X = x)
(h) [2 pts] When searching a zero-sum game tree, what are the advantages and drawbacks (if any) of using an evaluation function? How would you utilize it?
(a) [2 pts] Consider the following zero-sum game with 2 players. At each leaf we have labeled the payoffs Player 1 receives. It is Player 1’s turn to move. Assume both players play optimally at every time step (i.e. Player 1 seeks to maximize the payoff, while Player 2 seeks to minimize the payoff). Circle Player 1’s optimal next move on the graph, and state the minimax value of the game. Show your work.
(b) [2 pts] Consider the following game tree. Player 1 moves first, and attempts to maximize the expected payoff. Player 2 moves second, and attempts to minimize the expected payoff. Expand nodes left to right. Cross out nodes pruned by alpha-beta pruning.
Consider the problem of controlling n pacmen simultaneously. Several pacmen can be in the same square at the same time, and at each time step, each pacman moves by at most one unit vertically or horizontally (in other words, a pacman can stop, and also several pacmen can move simultaneously). The goal of the game is to have all the pacmen be at the same square in the minimum number of time steps. In this question, use the following notation: let M denote the number of squares in the maze that are not walls (i.e. the number of squares where pacmen can go); n the number of pacmen; and pi = (xi, yi) : i = 1... n, the position of pacman i. Assume that the maze is connected.
(a) [1 pt] What is the state space of this problem?
(b) [1 pt] What is the size of the state space (not a bound, the exact size).
(c) [2 pts] Give the tightest upper bound on the branching factor of this problem.
(d) [2 pts] Bound the number of nodes expanded by uniform cost tree search on this problem, as a function of n and M. Justify your answer.
(e) [4 pts] Which of the following heuristics are admissible? Which one(s), if any, are consistent? Circle the corresponding Roman numerals and briefly justify all your answers.
∑n i=
∑n j=i+1(pi^6 =^ pj^ ) (i) Consistent? (ii) Admissible?
An incoming freshman starting in the Fall at Berkeley is trying to plan the classes she will take in order to graduate after 4 years (8 semesters). There is a subset R of required courses out of the complete set of courses C that must all be taken to graduate with a degree in her desired major. Additionally, for each course c ∈ C, there is a set of prerequisites Prereq(c) ⊂ C and a set of semesters Semesters(c) ⊆ S that it will be offered, where S = { 1 ,... , 8 } is the complete set of 8 semesters. A maximum load of 4 courses can be taken each semester.
(a) [5 pts] Formulate this course scheduling problem as a constraint satisfaction problem. Specify the set of variables, the domain of each variable, and the set of constraints. Your constraints need not be limited to unary and binary constraints. You may use any precise and unambiguous mathematical notation. Variables:
Constraints:
(b) [4 pts] The student managed to find a schedule of classes that will allow her to graduate in 8 semesters using the CSP formulation, but now she wants to find a schedule that will allow her to graduate in as few semesters as possible. With this additional objective, formulate this problem as an uninformed search problem, using the specified state space, start state, and goal test. State space: The set of all (possibly partial) assignments x to the CSP.
Start state: The empty assignment.
Goal test: The assignment is a complete, consistent assignment to the CSP.
Successor function: Successors(x) =
Cost function: Cost(x, x′) =
(c) [3 pts] Instead of using uninformed search on the formulation as above, how could you modify backtracking search to efficiently find the least-semester solution?
(d) [4 pts] What is the probability that a dealer is honest given that he deals a 10 to a player holding 11 and is observed doing something suspicious?
You can either arrest dealers or let them continue working. If you arrest a dealer and he turns out to be cheating, you will earn a $4 bonus. However, if you arrest the dealer and he turns out to be innocent, he will sue you for -$10. Allowing the cheater to continue working will cost you -$2, while allowing an honest dealer to continue working will get you $1. Assume a linear utility function U (x) = x.
(e) [3 pts] You observe a dealer doing something suspicious (C) and also observe that he deals a 10 to a player holding 11. Should you arrest the dealer?
(f ) [3 pts] A private investigator approaches you and offers to investigate the dealer from the previous part. If you hire him, he will tell you with 100% certainty whether the dealer is cheating or honest, and you can then make a decision about whether to arrest him or not. How much would you be willing to pay for this information?
Consider a simple MDP with two states, S 1 and S 2 , two actions, A and B, a discount factor γ of 1/2, reward function R given by
R(s, a, s′) =
1 if s′^ = S 1 ; − 1 if s′^ = S 2 ;
and a transition function specified by the following table.
s a s′^ T (s, a, s′) S 1 A S 1 1 / 2 S 1 A S 2 1 / 2 S 1 B S 1 2 / 3 S 1 B S 2 1 / 3 S 2 A S 1 1 / 2 S 2 A S 2 1 / 2 S 2 B S 1 1 / 3 S 2 B S 2 2 / 3
(a) [2 pts] Perform a single iteration of value iteration, filling in the resultant Q-values and state values in the following tables. Use the specified initial value function V 0 , rather than starting from all zero state values. Only compute the entries not labeled “skip”. s a Q 1 (s, a) S 1 A S 1 B S 2 A skip S 2 B skip
s V 0 (s) V 1 (s) S 1 2 S 2 3 skip
(b) [2 pts] Suppose that Q-learning with a learning rate α of 1/2 is being run, and the following episode is observed. s 1 a 1 r 1 s 2 a 2 r 2 s 3 S 1 A 1 S 1 A − 1 S 2 Using the initial Q-values Q 0 , fill in the following table to indicate the resultant progression of Q-values. s a Q 0 (s, a) Q 1 (s, a) Q 2 (s, a) S 1 A − 1 / 2 S 1 B 0 S 2 A − 1 S 2 B 1
(c) [4 pts] Assuming that an -greedy policy (with respect to the Q-values as of when the action is taken) is used, where = 1/2, and given that the episode starts from S 1 and consists of 2 transitions, what is the probability of observing the episode from part b? State precisely your definition of the -greedy policy with respect to a Q-value function Q(s, a).