Artificial Intelligence: Search, Constraint Satisfaction, and Game Playing, Schemes and Mind Maps of Artificial Intelligence

Artificial Intelligence. Final Exam. INSTRUCTIONS. • You have 3 hours. • The exam is closed book, closed notes except two pages of crib sheets.

Typology: Schemes and Mind Maps

2022/2023

Uploaded on 05/11/2023

agrima
agrima 🇺🇸

4.8

(10)

257 documents

1 / 23

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
CS 188 Introduction to
Fall 2011 Artificial Intelligence Final Exam
INSTRUCTIONS
You have 3 hours.
The exam is closed book, closed notes except two pages of crib sheets.
Please use non-programmable calculators only.
Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a
brief explanation. All short answer sections can be successfully answered in a few sentences at most.
Last Name
First Name
SID
Login
All the work on this
exam is my own.
(please sign)
For staff use only
Q. 1 Q. 2 Q. 3 Q. 4 Q. 5 Q. 6 Q. 7 Q. 8 Total
/24 /12 /12 /8 /8 /14 /10 /12 /100
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17

Partial preview of the text

Download Artificial Intelligence: Search, Constraint Satisfaction, and Game Playing and more Schemes and Mind Maps Artificial Intelligence in PDF only on Docsity!

CS 188 Introduction to

Fall 2011 Artificial Intelligence Final Exam

INSTRUCTIONS

  • You have 3 hours.
  • The exam is closed book, closed notes except two pages of crib sheets.
  • Please use non-programmable calculators only.
  • Mark your answers ON THE EXAM ITSELF. If you are not sure of your answer you may wish to provide a brief explanation. All short answer sections can be successfully answered in a few sentences at most.

Last Name

First Name

SID

Login

All the work on this exam is my own. (please sign)

For staff use only

Q. 1 Q. 2 Q. 3 Q. 4 Q. 5 Q. 6 Q. 7 Q. 8 Total

THIS PAGE INTENTIONALLY LEFT BLANK

Now consider the same search problem, but with a heuristic h′^ which is 0 at all states that lie along an optimal path to a goal and h∗^ elsewhere.

(e) (2 pt) Circle all of the following that are true (if any).

(i) h′^ is admissible.

(ii) h′^ is consistent.

(iii) A∗^ tree search (no closed list) with h′^ will be optimal.

(iv) A∗^ graph search (with closed list) with h′^ will be optimal.

h′^ is not consistent as needed for optimality of graph search. Consistency is violated by any edge connecting a state outside of the optimal path to the optimal path since h′^ drops faster than the reduction in the true cost. (f ) (2 pt) You are running minimax in a large game tree. However, you know that the minimax value is between x −  and x + , where x is a real number and  is a small real number. You know nothing about the minimax values at the other nodes. Describe briefly but precisely how to modify the alpha-beta pruning algorithm to take advantage of this information. Initialize the alpha and beta values at the root to α = x −  and β = x +  then run alpha-beta as normal. This will prune all of the game tree out of our known minimax bounds.

(g) (2 pt) An agent prefers chocolate ice cream over strawberry and prefers strawberry over vanilla. Moreover, the agent is indifferent between deterministically getting strawberry and a lottery with a 90% chance of chocolate and a 10% chance of vanilla. Which of the following utilities can capture the agent’s preferences?

(i) Chocolate 2, Strawberry 1, Vanilla 0

(ii) Chocolate 10, Strawberry 9, Vanilla 0

(iii) Chocolate 21, Strawberry 19, Vanilla 1

(iv) No utility function can capture these preferences.

One can verify that (ii) captures the agent’s preferences. Next, observe that (iii) is an affine transformation of the utilities of (ii) and so does not change the preference ordering.

NAME: 5

(h) (2 pt) Fill in the joint probability table so that the binary variables X and Y are independent.

X Y P (X, Y )

Independence requires that P (X, Y ) = P (X)P (Y ). (i) (1 pt) Suppose you are sampling from a conditional distribution P (B|A = +a) in a Bayes’ net. Give the fraction of samples that will be accepted in rejection sampling. P (A = +a) since samples that do not match the evidence are rejected. (j) (2 pt) One could simplify the particle filtering algorithm by getting rid of the resampling phase and instead keeping weighted particles at all times, with the weight of a particle being the product of all observation probabilities P (ei|Xi) up to and including the current timestep. Circle all of the following that are true (if any).

(i) This will always work as well as standard particle filtering.

(ii) This will generally work less well than standard particle filtering because all the particles will cluster in the most likely part of the state space.

(iii) This will generally work less well than standard particle filtering because most particles will end up in low-likelihood parts of the state space.

(iv) This will generally work less well than standard particle filtering because the number of particles you have will decrease over time.

Without resampling particle filters tend to give degenerate estimates of the distribution as only a few particles in high-likelihood states will have a significant weight. Resampling helps by re-normalizing the distribution. (k) (2 pt) Circle all settings in which particle filtering is preferable to exact HMM inference (if any).

(i) The state space is very small.

(ii) The state space is very large.

(iii) Speed is more important than accuracy.

(iv) Accuracy is more important than speed.

Exact HMM inference scales with the state space. For particle filtering, the number of particles determines the time complexity and quality of the approximation. Large (even continuous) state spaces can be appromixated by a much smaller number of particles.

NAME: 7

(ii) The test accuracy could be higher with the doubled features.

(iii) The test accuracy will be the same with either feature set.

The perceptron classification rule is h(x) = sign

i wifi(x) so doubling the features gives^ h(x) = sign^

i 2 wifi(x) = sign 2

i wifi(x) = sign^

i wifi(x) so it is unchanged by feature duplication.

  1. (12 points) Consistency and Harmony

(a) (2 pt) Consider a simple two-variable CSP, with variables A and B, both having domains { 1 , 2 , 3 , 4 }. There is only one constraint, which is that A and B sum to 6. In the table below, circle the the values that remain in the domains after enforcing arc consistency.

Variable Values remaining in domain

A { 1 , 2 , 3 , 4 }

B { 1 , 2 , 3 , 4 }

Now consider a general CSP C, with variables Xi having domains Di. Assume that all constraints are between pairs of variables. C is not necessarily arc consistent, so we enforce arc consistency in the standard way, i.e. by running AC-3 (the arc consistency algorithm from class). For each variable Xi, let DACi be its resulting arc consistent domain.

(b) (2 pt) Circle all of the following statements that are guaranteed to be true about the domains DACi compared to Di. (i) For all i, DACi ⊆ Di.

(ii) If C is initially arc consistent, then for all i, DACi = Di.

(iii) If C is not initially arc consistent, then for at least one i, DACi 6 = Di.

(c) (2 pt) Let n be the number of solutions to C and let nAC^ be the number of solutions remaining after arc consistency has been enforced. Circle all of the following statements that could be true, if any. (i) nAC^ = 0 but no DACi are empty.

(ii) nAC^ ≥ 1 but some DACi is empty.

(iii) nAC^ ≥ 1 but a backtracking solver could still backtrack.

(iv) nAC^ > 1

(v) nAC^ < n

(vi) nAC^ = n

(vii) nAC^ > n

Arc consistency does not imply variables are jointly-consistent as in path consistency or other k consis- tencies so (i) and (iii) are possible. Multiple solutions could be possible no matter what the consistency so (iv) is true. The CSP has a certain number of solutions regardless of whether or not a particular particular assignment is arc consistent or not, so (vi) could be true.

(v) Si ⊆ DiAH

(vi) DAHi ⊆ Si

Refer to reasoning in previous parts.

NAME: 11

  1. (12 points) Pac-Infiltration A special operations Pacman must reach the exit (located on the bottom right square) in a ghost-filled maze, as shown to the right. Every time step, each agent (i.e., Pacman and each ghost) can move to any adjacent non-wall square or stay in place. Pacman moves first and the ghosts move collectively and simultaneously in response (so, consider the ghosts to be one agent). Pacman’s goal is to exit as soon as possible. If Pacman and a ghost ever occupy the same square, either through Pacman or ghost motion, the game ends, and Pacman receives utility −1. If Pacman moves into the exit square, he receives utility +1 and the game ends. The ghosts are adversarial and seek to minimize Pacman’s utility. For parts (a) and (b), consider the 2x3 board with one ghost shown above. In this example, it is Pacman’s move.

(a) (2 pt) What is the minimax value of this state? +1 because Pacman is guaranteed a victory with his first-move advantage

(b) (2 pt) If we perform a depth-limited minimax search from this state, searching to depth d = 1, 2 , 3 , 4 and using zero as the evaluation function for non-terminal states, what minimax values will be computed? Reminder: each layer of depth includes one move from each agent.

d Minimax value 1 0

2 1 3 1 4 1

For parts (c) and (d), consider a grid of size N × M containing K ghosts. Assume we perform a depth-limited minimax search, searching to depth d.

(c) (2 pt) How many states will be expanded (asymptotically) in terms of d using standard minimax? Your answer should use all the relevant variables needed and should not be specific to any particular map or configuration. (5K+1)d. The branching factor for Pacman is 5 since he can move to any adjacent square (4 choices) or stay put (+1 choice). The branching factor for ghosts is 5K^ since each of the K ghosts has the same 5 movement options. The number of nodes expanded is the branching factor raised to the depth d.

(d) (1 pt) For small boards, why is standard recursive minimax a poor choice for this problem? Your answer should be no more than one sentence! For small boards the depth of the goal (exit with +1) will be small, and since the goal ends the game the search need not be continued and the further expansion done by standard minimax is a waste.

NAME: 13

Consider an MDP modeling a hurdle race track, shown below. There is a single hurdle in square D. The terminal state is G. The agent can run left or right. If the agent is in square C, it cannot run right. Instead, it can jump, which either results in a fall to the hurdle square D, or a successful hurdle jump to square E. Rewards are shown below. Assume a discount of γ = 1.

Rewards:

A B C D E F G

+1 +1^ +1^ +1^ +

-2 -2^ -2 -2 -

Actions:

  • right: Deterministically move to the right.
  • left: Deterministically move to the left.
  • jump: Stochastically jump to the right. This action is available for square C only. T (C, jump, E) = 0.5 (jump succeeds) T (C, jump, D) = 0.5 (jump fails)

Note: The agent cannot use right from square C.

(a) (2 pt) For the policy π of always moving forward (i.e., using actions right or jump), compute V π^ (C). V pi(C) = 13+15 2 = 14. Without discounting (γ = 1), the value is the sum of rewards. The reward to go from C is 1 + 1 + 1 + 10 = 13 if the jump fails or 4 + 1 + 10 = 15 if the jump succeeds. The jump succeeds or fails with equality probability so the expected value is the average. (b) (3 pt) Perform two iterations of value iteration and compute the following. Iteration 0 corresponds to the initialization of all values to 0.

V 2 (B) 3. 5

Q 2 (B, right) 3. 5

Q 2 (B, left) − 1

A B C D E F G

V 1 1 1 2. 5 1 1 10 0

V 2 2 3. 5 3 2 11 10 0

Remember that the Vk+1 and Qk+1 updates use the values Vk and Qk for the successor states. V 2 (B) is the maximum over the Qs for B. Note: Two iterations of value iteration computes the value for an h = 2 two step horizon episode. Thus V 2 (s) for any state s is equal to the maximum expected return for two time steps.

(c) (3 pt) Fill in the blank cells of the table below with the Q-values that result from applying the Q-learning update for the 4 transitions specified by the episode below. You may leave Q-values that are unaffected by the current update blank. Use a learning rate α of 0.5. Assume all Q-values are initialized to 0.

Episode s a r s a r s a r s a r s

C jump +4 E right +1 F left -2 E right +1 F

Q(C, left) Q(C, jump) Q(E, left) Q(E, right) Q(F, left) Q(F, right) Initial 0 0 0 0 0 0 Transition 1 2

Transition 2 0. 5

Transition 3 − 0. 75

Transition 4 0. 75

The Q-learning update is Q′(s, a) = Q(s, a) + α [r + maxa′^ γQ(s′, a′) − Q(s, a)].

A state-action is only updated when a transition is made from it. Q(C, lef t), Q(E, lef t), and Q(F, right) state-actions are never experienced and so these values are never updated.

On transition 2, the Qs for F are still both 0, so the update increases the value by the reward +1 times the learning rate. On transition 3, the reward of −2 and Q(E, right) = 0.5 are included in the update. On transition 4, Q(F, lef t) is now − 0 .75 but Q(F, right) is still 0 so the next update to Q(E, right) uses 0 in the max over the next state’s action.

As an evaluation function, take the negative of the number of valid moves (i.e., no duplicates or cycles) that break d-separation. A partial game tree is shown below.

(b) (3 pt) Using minimax with this evaluation function, write the value of every node in the dotted boxes provided. Three values have been filled in for you. (c) (2 pt) Draw an “X” over any nodes that alpha-beta search would prune when evaluating the root node, assuming it evaluates nodes from left to right.

  • 2
  • 2
  • 3 – 6
  • 4 – 4

istheevaluation functionvalue

NAME: 17

  1. (14 points) Occupy Cal You are at Occupy Cal, and the leaders of the protest are deciding whether or not to march on California Hall. The decision is made centrally and communicated to the occupiers via the “human microphone”; that is, those who hear the information repeat it so that it propagates outward from the center. This scenario is modeled by the following Bayes net:

A

B

D

H I

E

J K

C

F

L M

G

N O

A P(A)

  • m 0. ˗m 0.

π(X) X P(X|π(X)) +m +m 0. +m ˗m 0. ˗m +m 0. ˗m ˗m 0.

Each random variable represents whether a given group of protestors hears instructions to march (+m) or not (−m). The decision is made at A, and both outcomes are equally likely. The protestors at each node relay what they hear to their two child nodes, but due to the noise, there is some chance that the information will be misheard. Each node except A takes the same value as its parent with probability 0.9, and the opposite value with probability 0.1, as in the conditional probability tables shown.

(a) (2 pt) Compute the probability that node A sent the order to march (A = +m) given that both B and C receive the order to march (B = +m, C = +m).

p(A = +m |B = +m, C = +m) = ∑p(A=+m,B=+m,C=+m) a p(A=a,B=+m,C=+m)^ = p(∑A=+m)p(B=+m|A=+m)p(C=+m|A=+m) a p(A=a)p(B=+m|A=a)p(C=+m|A=a)^

p ∑(A=+m)p(X=+m|π(X)=+m)^2 a p(A=a)p(X=+m|π(X)=a)^2

=.^5 ·.^9

2

. 5 ·. 92 +. 5 ·. 12 ) =^ . 92 . 92 +. 12 ≈^0 .988.

Note that P (A) is uniform. It can be pulled out of sums and cancelled in the numerator and denominator. For simplicity P (A) terms are dropped.

For further simplification, note that the conditional distribution P (X|π(X)) is symmetric, so calculations can be simplified in terms of p(X, π(X)same) = .9 and (p(X, π(X)dif f erent) = .1.

NAME: 19

For the following parts, you should circle nodes in the accompanying diagrams that have the given properties. Use the quantities you have already computed and intuition to answer the following question parts; you should not need to do any computation.

(f ) (2 pt) Circle the nodes for which knowing the value of that node changes your belief about the decision made at A given evidence at D (i.e. nodes X such that P (A|X, D) 6 = P (A|D)).

A

B

D

H I

E

J K

C

F

L M

G

N O

Any node that is not conditionally independent of A given D will change the posterior belief about A. The only nodes that would not change the posterior are H and I.

(g) (2 pt) Circle the nodes which have nonzero VPI given evidence at D.

A

B

D

H I

E

J K

C

F

L M

G

N O

Circle A, B, C. If any of these nodes are observed, the optimal action is to follow the instruction at the newly observed node and ignore the information at D. The reason for doing so is that A, B or C are more accurate estimators of A than D. Since the extra information could change the action taken, the VPI is nonzero. Observing any node at the same depth as D (E, F, G) or lower (H, I, J, K, L, M, N, O) would not warrant choosing a different action since these nodes are not more informative than D. Note that if we were given the choice of observing two additional nodes, for example E and F , than VPI(E, F | D) > 0 since the additional information from two nodes might change the action, for example if we get D = +m, E = −m, F = −m.

  1. (10 points) HMMs and Particle Filtering Consider a Markov Model with a binary state X (i.e., Xt is either 0 or 1). The transition probabilities are given as follows:

Xt Xt+1 P (Xt+1 | Xt) 0 0 0. 0 1 0. 1 0 0. 1 1 0.

(a) (2 pt) The prior belief distribution over the initial state X 0 is uniform, i.e., P (X 0 = 0) = P (X 0 = 1) = 0.5. After one timestep, what is the new belief distribution, P (X 1 )?

X 1 P (X 1 )

Since the prior of X 0 is uniform, the belief at the next step is the transition distribution from X 0 : p(X 1 = 0) = p(X 0 = 0)p(X 1 = 0|X 0 = 0) + p(X 0 = 1)p(X 1 = 0|X 0 = 1) = .5(.9) + .5(.5) = .7. p(X 1 = 1) = p(X 0 = 0)p(X 1 = 1|X 0 = 0) + p(X 0 = 1)p(X 1 = 1|X 0 = 1) = .5(.1) + .5(.5) = .3.

Now, we incorporate sensor readings. The sensor model is parameterized by a number β ∈ [0, 1]:

Xt Et P (Et | Xt) 0 0 β 0 1 (1 − β) 1 0 (1 − β) 1 1 β

(b) (2 pt) At t = 1, we get the first sensor reading, E 1 = 0. Use your answer from part (a) to compute P (X 1 = 0 | E 1 = 0). Leave your answer in terms of β.

p(X 1 = 0|E 1 = 0) =

p(E 1 = 0|X 1 = 0)p(X 1 = 0) ∑ x p(E^1 = 0|X^1 =^ x)p(X^1 =^ x)

=

β(0.7) β(0.7) + (1 − β)(0.3)

(c) (2 pt) For what range of values of β will a sensor reading E 1 = 0 increase our belief that X 1 = 0? That is, what is the range of β for which P (X 1 = 0 | E 1 = 0) > P (X 1 = 0)? β ∈ (0. 5 , 1]. Intuitively, observing E 1 = 0 will only increase the belief that X 1 = 0 if E 1 = 0 is more likely under X 1 = 0 than not. Note that β > 0 .5; β = 0.5 is uninformative since the conditional distribution is uniform. This can be verified algebraically by setting p(X 1 = 0) = p(X 1 = 0|E 1 = 0) and solving for β. (d) (2 pt) Unfortunately, the sensor breaks after just one reading, and we receive no further sensor informa- tion. Compute P (X∞ | E 1 = 0), the stationary distribution very many timesteps from now.