CS188 Exam Questions: Search, Game Theory, MDPs, and CSPs, Schemes and Mind Maps of Artificial Intelligence

Artificial Intelligence. Midterm Exam. INSTRUCTIONS. • You have 3 hours. • The exam is closed book, closed notes except a two-page crib sheet.

Typology: Schemes and Mind Maps

2021/2022

Uploaded on 08/01/2022

fabh_99
fabh_99 🇧🇭

4.4

(53)

543 documents

1 / 12

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 188
Spring 2010 Introduction to
Artificial Intelligence Midterm Exam
INSTRUCTIONS
You have 3 hours.
The exam is closed book, closed notes except a two-page crib sheet.
Please use non-programmable calculators only.
Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a
brief explanation. All short answer sections can be successfully answered in a few sentences AT MOST.
First name
Last name
SID
Login
For staff use only:
Q1. Search Traces /15
Q2. Multiple-choice and short-answer questions /16
Q3. Minimax and Expectimax /12
Q4. n-pacmen search /10
Q5. CSPs: Course scheduling /12
Q6. Cheating at cs188-Blackjack /19
Q7. Markov Decision Processes /16
Total /100
1
pf3
pf4
pf5
pf8
pf9
pfa

Partial preview of the text

Download CS188 Exam Questions: Search, Game Theory, MDPs, and CSPs and more Schemes and Mind Maps Artificial Intelligence in PDF only on Docsity!

CS 188

Spring 2010

Introduction to

Artificial Intelligence Midterm Exam

INSTRUCTIONS

  • You have 3 hours.
  • The exam is closed book, closed notes except a two-page crib sheet.
  • Please use non-programmable calculators only.
  • Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. All short answer sections can be successfully answered in a few sentences AT MOST.

First name

Last name

SID

Login

For staff use only:

Q1. Search Traces /

Q2. Multiple-choice and short-answer questions /

Q3. Minimax and Expectimax /

Q4. n-pacmen search /

Q5. CSPs: Course scheduling /

Q6. Cheating at cs188-Blackjack /

Q7. Markov Decision Processes /

Total /

Q1. [15 pts] Search Traces

Each of the trees (G1 through G5) was generated by searching the graph (below, left) with a graph search algorithm. Assume children of a node are visited in alphabetical order. Each tree shows only the nodes that have been expanded. Numbers next to nodes indicate the relevant “score” used by the algorithm’s priority queue. The start state is A, and the goal state is G.

For each tree, indicate:

  1. Whether it was generated with depth first search, breadth first search, uniform cost search, or A∗^ search. Algorithms may appear more than once.
  2. If the algorithm uses a heuristic function, say whether we used H1 = {h(A) = 3, h(B) = 6, h(C) = 4, h(D) = 3} H2 = {h(A) = 3, h(B) = 3, h(C) = 0, h(D) = 1}
  3. For all algorithms, say whether the result was an optimal path (assuming we want to minimize sum of link costs). If the result was not optimal, state why the algorithm found a suboptimal path.

Please fill in your answers on the next page.

Q2. [16 pts] Multiple-choice and short-answer questions

In the following problems please choose all the answers that apply, if any. You may circle more than one answer. You may also circle no answers (none of the above)

(a) [2 pts] Consider two consistent heuristics, H 1 and H 2 , in an A∗^ search seeking to minimize path costs in a graph. Assume ties don’t occur in the priority queue. If H 1 (s) ≤ H 2 (s) for all s, then

(i) A∗^ search using H 1 will find a lower cost path than A∗^ search using H 2. (ii) A∗^ search using H 2 will find a lower cost path than A∗^ search using H 1. (iii) A∗^ search using H 1 will not expand more nodes than A∗^ search using H 2. (iv) A∗^ search using H 2 will not expand more nodes than A∗^ search using H 1.

(b) [2 pts] Alpha-beta pruning:

(i) May not find the minimax optimal strategy. (ii) Prunes the same number of subtrees independent of the order in which successor states are expanded. (iii) Generally requires more run-time than minimax on the same game tree.

(c) [2 pts] Value iteration:

(i) Is a model-free method for finding optimal policies. (ii) Is sensitive to local optima. (iii) Is tedious to do by hand. (iv) Is guaranteed to converge when the discount factor satisfies 0 < γ < 1.

(d) [2 pts] Bayes nets:

(i) Have an associated directed, acyclic graph. (ii) Encode conditional independence assertions among random variables. (iii) Generally require less storage than the full joint distribution. (iv) Make the assumption that all parents of a single child are independent given the child.

(e) [2 pts] True or false? If a heuristic is admissible, it is also consistent.

(f ) [2 pts] If we use an -greedy exploration policy with Q-learning, the estimates Qt are guaranteed to converge to Q∗^ only if:

(i)  goes to zero as t goes to infinity, or (ii) the learning rate α goes to zero as t goes to infinity, or (iii) both α and  go to zero.

(g) [2 pts] True or false? Suppose X and Y are correlated random variables. Then

P (X = x, Y = y) = P (X = x)P (Y = y|X = x)

(h) [2 pts] When searching a zero-sum game tree, what are the advantages and drawbacks (if any) of using an evaluation function? How would you utilize it?

Q3. [12 pts] Minimax and Expectimax

(a) [2 pts] Consider the following zero-sum game with 2 players. At each leaf we have labeled the payoffs Player 1 receives. It is Player 1’s turn to move. Assume both players play optimally at every time step (i.e. Player 1 seeks to maximize the payoff, while Player 2 seeks to minimize the payoff). Circle Player 1’s optimal next move on the graph, and state the minimax value of the game. Show your work.

Player 1

Player 2

Player 1

(b) [2 pts] Consider the following game tree. Player 1 moves first, and attempts to maximize the expected payoff. Player 2 moves second, and attempts to minimize the expected payoff. Expand nodes left to right. Cross out nodes pruned by alpha-beta pruning.

Player 1

Player 2

Player 1

Q4. [10 pts] n-pacmen search

Consider the problem of controlling n pacmen simultaneously. Several pacmen can be in the same square at the same time, and at each time step, each pacman moves by at most one unit vertically or horizontally (in other words, a pacman can stop, and also several pacmen can move simultaneously). The goal of the game is to have all the pacmen be at the same square in the minimum number of time steps. In this question, use the following notation: let M denote the number of squares in the maze that are not walls (i.e. the number of squares where pacmen can go); n the number of pacmen; and pi = (xi, yi) : i = 1... n, the position of pacman i. Assume that the maze is connected.

(a) [1 pt] What is the state space of this problem?

(b) [1 pt] What is the size of the state space (not a bound, the exact size).

(c) [2 pts] Give the tightest upper bound on the branching factor of this problem.

(d) [2 pts] Bound the number of nodes expanded by uniform cost tree search on this problem, as a function of n and M. Justify your answer.

(e) [4 pts] Which of the following heuristics are admissible? Which one(s), if any, are consistent? Circle the corresponding Roman numerals and briefly justify all your answers.

  1. The number of (ordered) pairs (i, j) of pacmen with different coordinates: h 1 :

∑n i=

∑n j=i+1(pi^6 =^ pj^ ) (i) Consistent? (ii) Admissible?

  1. h 2 : 12 max(maxi,j |xi − xj |, maxi,j |yi − yj |) (i) Consistent? (ii) Admissible?

Q5. [12 pts] CSPs: Course scheduling

An incoming freshman starting in the Fall at Berkeley is trying to plan the classes she will take in order to graduate after 4 years (8 semesters). There is a subset R of required courses out of the complete set of courses C that must all be taken to graduate with a degree in her desired major. Additionally, for each course c ∈ C, there is a set of prerequisites Prereq(c) ⊂ C and a set of semesters Semesters(c) ⊆ S that it will be offered, where S = { 1 ,... , 8 } is the complete set of 8 semesters. A maximum load of 4 courses can be taken each semester.

(a) [5 pts] Formulate this course scheduling problem as a constraint satisfaction problem. Specify the set of variables, the domain of each variable, and the set of constraints. Your constraints need not be limited to unary and binary constraints. You may use any precise and unambiguous mathematical notation. Variables:

Constraints:

(b) [4 pts] The student managed to find a schedule of classes that will allow her to graduate in 8 semesters using the CSP formulation, but now she wants to find a schedule that will allow her to graduate in as few semesters as possible. With this additional objective, formulate this problem as an uninformed search problem, using the specified state space, start state, and goal test. State space: The set of all (possibly partial) assignments x to the CSP.

Start state: The empty assignment.

Goal test: The assignment is a complete, consistent assignment to the CSP.

Successor function: Successors(x) =

Cost function: Cost(x, x′) =

(c) [3 pts] Instead of using uninformed search on the formulation as above, how could you modify backtracking search to efficiently find the least-semester solution?

(d) [4 pts] What is the probability that a dealer is honest given that he deals a 10 to a player holding 11 and is observed doing something suspicious?

You can either arrest dealers or let them continue working. If you arrest a dealer and he turns out to be cheating, you will earn a $4 bonus. However, if you arrest the dealer and he turns out to be innocent, he will sue you for -$10. Allowing the cheater to continue working will cost you -$2, while allowing an honest dealer to continue working will get you $1. Assume a linear utility function U (x) = x.

(e) [3 pts] You observe a dealer doing something suspicious (C) and also observe that he deals a 10 to a player holding 11. Should you arrest the dealer?

(f ) [3 pts] A private investigator approaches you and offers to investigate the dealer from the previous part. If you hire him, he will tell you with 100% certainty whether the dealer is cheating or honest, and you can then make a decision about whether to arrest him or not. How much would you be willing to pay for this information?

Q7. [16 pts] Markov Decision Processes

Consider a simple MDP with two states, S 1 and S 2 , two actions, A and B, a discount factor γ of 1/2, reward function R given by

R(s, a, s′) =

1 if s′^ = S 1 ; − 1 if s′^ = S 2 ;

and a transition function specified by the following table.

s a s′^ T (s, a, s′) S 1 A S 1 1 / 2 S 1 A S 2 1 / 2 S 1 B S 1 2 / 3 S 1 B S 2 1 / 3 S 2 A S 1 1 / 2 S 2 A S 2 1 / 2 S 2 B S 1 1 / 3 S 2 B S 2 2 / 3

(a) [2 pts] Perform a single iteration of value iteration, filling in the resultant Q-values and state values in the following tables. Use the specified initial value function V 0 , rather than starting from all zero state values. Only compute the entries not labeled “skip”. s a Q 1 (s, a) S 1 A S 1 B S 2 A skip S 2 B skip

s V 0 (s) V 1 (s) S 1 2 S 2 3 skip

(b) [2 pts] Suppose that Q-learning with a learning rate α of 1/2 is being run, and the following episode is observed. s 1 a 1 r 1 s 2 a 2 r 2 s 3 S 1 A 1 S 1 A − 1 S 2 Using the initial Q-values Q 0 , fill in the following table to indicate the resultant progression of Q-values. s a Q 0 (s, a) Q 1 (s, a) Q 2 (s, a) S 1 A − 1 / 2 S 1 B 0 S 2 A − 1 S 2 B 1

(c) [4 pts] Assuming that an -greedy policy (with respect to the Q-values as of when the action is taken) is used, where  = 1/2, and given that the episode starts from S 1 and consists of 2 transitions, what is the probability of observing the episode from part b? State precisely your definition of the -greedy policy with respect to a Q-value function Q(s, a).