















Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Artificial Intelligence. Final Exam. INSTRUCTIONS. • You have 3 hours. • The exam is closed book, closed notes except two pages of crib sheets.
Typology: Schemes and Mind Maps
1 / 23
This page cannot be seen from the preview
Don't miss anything!
















Last Name
First Name
Login
All the work on this exam is my own. (please sign)
For staff use only
Now consider the same search problem, but with a heuristic h′^ which is 0 at all states that lie along an optimal path to a goal and h∗^ elsewhere.
(e) (2 pt) Circle all of the following that are true (if any).
(i) h′^ is admissible.
(ii) h′^ is consistent.
(iii) A∗^ tree search (no closed list) with h′^ will be optimal.
(iv) A∗^ graph search (with closed list) with h′^ will be optimal.
h′^ is not consistent as needed for optimality of graph search. Consistency is violated by any edge connecting a state outside of the optimal path to the optimal path since h′^ drops faster than the reduction in the true cost. (f ) (2 pt) You are running minimax in a large game tree. However, you know that the minimax value is between x − and x + , where x is a real number and is a small real number. You know nothing about the minimax values at the other nodes. Describe briefly but precisely how to modify the alpha-beta pruning algorithm to take advantage of this information. Initialize the alpha and beta values at the root to α = x − and β = x + then run alpha-beta as normal. This will prune all of the game tree out of our known minimax bounds.
(g) (2 pt) An agent prefers chocolate ice cream over strawberry and prefers strawberry over vanilla. Moreover, the agent is indifferent between deterministically getting strawberry and a lottery with a 90% chance of chocolate and a 10% chance of vanilla. Which of the following utilities can capture the agent’s preferences?
(i) Chocolate 2, Strawberry 1, Vanilla 0
(ii) Chocolate 10, Strawberry 9, Vanilla 0
(iii) Chocolate 21, Strawberry 19, Vanilla 1
(iv) No utility function can capture these preferences.
One can verify that (ii) captures the agent’s preferences. Next, observe that (iii) is an affine transformation of the utilities of (ii) and so does not change the preference ordering.
(h) (2 pt) Fill in the joint probability table so that the binary variables X and Y are independent.
Independence requires that P (X, Y ) = P (X)P (Y ). (i) (1 pt) Suppose you are sampling from a conditional distribution P (B|A = +a) in a Bayes’ net. Give the fraction of samples that will be accepted in rejection sampling. P (A = +a) since samples that do not match the evidence are rejected. (j) (2 pt) One could simplify the particle filtering algorithm by getting rid of the resampling phase and instead keeping weighted particles at all times, with the weight of a particle being the product of all observation probabilities P (ei|Xi) up to and including the current timestep. Circle all of the following that are true (if any).
(i) This will always work as well as standard particle filtering.
(ii) This will generally work less well than standard particle filtering because all the particles will cluster in the most likely part of the state space.
(iii) This will generally work less well than standard particle filtering because most particles will end up in low-likelihood parts of the state space.
(iv) This will generally work less well than standard particle filtering because the number of particles you have will decrease over time.
Without resampling particle filters tend to give degenerate estimates of the distribution as only a few particles in high-likelihood states will have a significant weight. Resampling helps by re-normalizing the distribution. (k) (2 pt) Circle all settings in which particle filtering is preferable to exact HMM inference (if any).
(i) The state space is very small.
(ii) The state space is very large.
(iii) Speed is more important than accuracy.
(iv) Accuracy is more important than speed.
Exact HMM inference scales with the state space. For particle filtering, the number of particles determines the time complexity and quality of the approximation. Large (even continuous) state spaces can be appromixated by a much smaller number of particles.
(ii) The test accuracy could be higher with the doubled features.
(iii) The test accuracy will be the same with either feature set.
The perceptron classification rule is h(x) = sign
i wifi(x) so doubling the features gives^ h(x) = sign^
i 2 wifi(x) = sign 2
i wifi(x) = sign^
i wifi(x) so it is unchanged by feature duplication.
(a) (2 pt) Consider a simple two-variable CSP, with variables A and B, both having domains { 1 , 2 , 3 , 4 }. There is only one constraint, which is that A and B sum to 6. In the table below, circle the the values that remain in the domains after enforcing arc consistency.
Now consider a general CSP C, with variables Xi having domains Di. Assume that all constraints are between pairs of variables. C is not necessarily arc consistent, so we enforce arc consistency in the standard way, i.e. by running AC-3 (the arc consistency algorithm from class). For each variable Xi, let DACi be its resulting arc consistent domain.
(b) (2 pt) Circle all of the following statements that are guaranteed to be true about the domains DACi compared to Di. (i) For all i, DACi ⊆ Di.
(ii) If C is initially arc consistent, then for all i, DACi = Di.
(iii) If C is not initially arc consistent, then for at least one i, DACi 6 = Di.
(c) (2 pt) Let n be the number of solutions to C and let nAC^ be the number of solutions remaining after arc consistency has been enforced. Circle all of the following statements that could be true, if any. (i) nAC^ = 0 but no DACi are empty.
(ii) nAC^ ≥ 1 but some DACi is empty.
(iii) nAC^ ≥ 1 but a backtracking solver could still backtrack.
(iv) nAC^ > 1
(v) nAC^ < n
(vi) nAC^ = n
(vii) nAC^ > n
Arc consistency does not imply variables are jointly-consistent as in path consistency or other k consis- tencies so (i) and (iii) are possible. Multiple solutions could be possible no matter what the consistency so (iv) is true. The CSP has a certain number of solutions regardless of whether or not a particular particular assignment is arc consistent or not, so (vi) could be true.
(v) Si ⊆ DiAH
(vi) DAHi ⊆ Si
Refer to reasoning in previous parts.
(a) (2 pt) What is the minimax value of this state? +1 because Pacman is guaranteed a victory with his first-move advantage
(b) (2 pt) If we perform a depth-limited minimax search from this state, searching to depth d = 1, 2 , 3 , 4 and using zero as the evaluation function for non-terminal states, what minimax values will be computed? Reminder: each layer of depth includes one move from each agent.
d Minimax value 1 0
2 1 3 1 4 1
For parts (c) and (d), consider a grid of size N × M containing K ghosts. Assume we perform a depth-limited minimax search, searching to depth d.
(c) (2 pt) How many states will be expanded (asymptotically) in terms of d using standard minimax? Your answer should use all the relevant variables needed and should not be specific to any particular map or configuration. (5K+1)d. The branching factor for Pacman is 5 since he can move to any adjacent square (4 choices) or stay put (+1 choice). The branching factor for ghosts is 5K^ since each of the K ghosts has the same 5 movement options. The number of nodes expanded is the branching factor raised to the depth d.
(d) (1 pt) For small boards, why is standard recursive minimax a poor choice for this problem? Your answer should be no more than one sentence! For small boards the depth of the goal (exit with +1) will be small, and since the goal ends the game the search need not be continued and the further expansion done by standard minimax is a waste.
Consider an MDP modeling a hurdle race track, shown below. There is a single hurdle in square D. The terminal state is G. The agent can run left or right. If the agent is in square C, it cannot run right. Instead, it can jump, which either results in a fall to the hurdle square D, or a successful hurdle jump to square E. Rewards are shown below. Assume a discount of γ = 1.
Rewards:
+1 +1^ +1^ +1^ +
-2 -2^ -2 -2 -
Actions:
Note: The agent cannot use right from square C.
(a) (2 pt) For the policy π of always moving forward (i.e., using actions right or jump), compute V π^ (C). V pi(C) = 13+15 2 = 14. Without discounting (γ = 1), the value is the sum of rewards. The reward to go from C is 1 + 1 + 1 + 10 = 13 if the jump fails or 4 + 1 + 10 = 15 if the jump succeeds. The jump succeeds or fails with equality probability so the expected value is the average. (b) (3 pt) Perform two iterations of value iteration and compute the following. Iteration 0 corresponds to the initialization of all values to 0.
Remember that the Vk+1 and Qk+1 updates use the values Vk and Qk for the successor states. V 2 (B) is the maximum over the Qs for B. Note: Two iterations of value iteration computes the value for an h = 2 two step horizon episode. Thus V 2 (s) for any state s is equal to the maximum expected return for two time steps.
(c) (3 pt) Fill in the blank cells of the table below with the Q-values that result from applying the Q-learning update for the 4 transitions specified by the episode below. You may leave Q-values that are unaffected by the current update blank. Use a learning rate α of 0.5. Assume all Q-values are initialized to 0.
Episode s a r s a r s a r s a r s
C jump +4 E right +1 F left -2 E right +1 F
Q(C, left) Q(C, jump) Q(E, left) Q(E, right) Q(F, left) Q(F, right) Initial 0 0 0 0 0 0 Transition 1 2
Transition 2 0. 5
Transition 3 − 0. 75
Transition 4 0. 75
The Q-learning update is Q′(s, a) = Q(s, a) + α [r + maxa′^ γQ(s′, a′) − Q(s, a)].
A state-action is only updated when a transition is made from it. Q(C, lef t), Q(E, lef t), and Q(F, right) state-actions are never experienced and so these values are never updated.
On transition 2, the Qs for F are still both 0, so the update increases the value by the reward +1 times the learning rate. On transition 3, the reward of −2 and Q(E, right) = 0.5 are included in the update. On transition 4, Q(F, lef t) is now − 0 .75 but Q(F, right) is still 0 so the next update to Q(E, right) uses 0 in the max over the next state’s action.
As an evaluation function, take the negative of the number of valid moves (i.e., no duplicates or cycles) that break d-separation. A partial game tree is shown below.
(b) (3 pt) Using minimax with this evaluation function, write the value of every node in the dotted boxes provided. Three values have been filled in for you. (c) (2 pt) Draw an “X” over any nodes that alpha-beta search would prune when evaluating the root node, assuming it evaluates nodes from left to right.
istheevaluation functionvalue
π(X) X P(X|π(X)) +m +m 0. +m ˗m 0. ˗m +m 0. ˗m ˗m 0.
Each random variable represents whether a given group of protestors hears instructions to march (+m) or not (−m). The decision is made at A, and both outcomes are equally likely. The protestors at each node relay what they hear to their two child nodes, but due to the noise, there is some chance that the information will be misheard. Each node except A takes the same value as its parent with probability 0.9, and the opposite value with probability 0.1, as in the conditional probability tables shown.
(a) (2 pt) Compute the probability that node A sent the order to march (A = +m) given that both B and C receive the order to march (B = +m, C = +m).
p(A = +m |B = +m, C = +m) = ∑p(A=+m,B=+m,C=+m) a p(A=a,B=+m,C=+m)^ = p(∑A=+m)p(B=+m|A=+m)p(C=+m|A=+m) a p(A=a)p(B=+m|A=a)p(C=+m|A=a)^
p ∑(A=+m)p(X=+m|π(X)=+m)^2 a p(A=a)p(X=+m|π(X)=a)^2
2
. 5 ·. 92 +. 5 ·. 12 ) =^ . 92 . 92 +. 12 ≈^0 .988.
Note that P (A) is uniform. It can be pulled out of sums and cancelled in the numerator and denominator. For simplicity P (A) terms are dropped.
For further simplification, note that the conditional distribution P (X|π(X)) is symmetric, so calculations can be simplified in terms of p(X, π(X)same) = .9 and (p(X, π(X)dif f erent) = .1.
For the following parts, you should circle nodes in the accompanying diagrams that have the given properties. Use the quantities you have already computed and intuition to answer the following question parts; you should not need to do any computation.
(f ) (2 pt) Circle the nodes for which knowing the value of that node changes your belief about the decision made at A given evidence at D (i.e. nodes X such that P (A|X, D) 6 = P (A|D)).
Any node that is not conditionally independent of A given D will change the posterior belief about A. The only nodes that would not change the posterior are H and I.
(g) (2 pt) Circle the nodes which have nonzero VPI given evidence at D.
Circle A, B, C. If any of these nodes are observed, the optimal action is to follow the instruction at the newly observed node and ignore the information at D. The reason for doing so is that A, B or C are more accurate estimators of A than D. Since the extra information could change the action taken, the VPI is nonzero. Observing any node at the same depth as D (E, F, G) or lower (H, I, J, K, L, M, N, O) would not warrant choosing a different action since these nodes are not more informative than D. Note that if we were given the choice of observing two additional nodes, for example E and F , than VPI(E, F | D) > 0 since the additional information from two nodes might change the action, for example if we get D = +m, E = −m, F = −m.
Xt Xt+1 P (Xt+1 | Xt) 0 0 0. 0 1 0. 1 0 0. 1 1 0.
(a) (2 pt) The prior belief distribution over the initial state X 0 is uniform, i.e., P (X 0 = 0) = P (X 0 = 1) = 0.5. After one timestep, what is the new belief distribution, P (X 1 )?
Since the prior of X 0 is uniform, the belief at the next step is the transition distribution from X 0 : p(X 1 = 0) = p(X 0 = 0)p(X 1 = 0|X 0 = 0) + p(X 0 = 1)p(X 1 = 0|X 0 = 1) = .5(.9) + .5(.5) = .7. p(X 1 = 1) = p(X 0 = 0)p(X 1 = 1|X 0 = 0) + p(X 0 = 1)p(X 1 = 1|X 0 = 1) = .5(.1) + .5(.5) = .3.
Now, we incorporate sensor readings. The sensor model is parameterized by a number β ∈ [0, 1]:
Xt Et P (Et | Xt) 0 0 β 0 1 (1 − β) 1 0 (1 − β) 1 1 β
(b) (2 pt) At t = 1, we get the first sensor reading, E 1 = 0. Use your answer from part (a) to compute P (X 1 = 0 | E 1 = 0). Leave your answer in terms of β.
p(X 1 = 0|E 1 = 0) =
p(E 1 = 0|X 1 = 0)p(X 1 = 0) ∑ x p(E^1 = 0|X^1 =^ x)p(X^1 =^ x)
=
β(0.7) β(0.7) + (1 − β)(0.3)
(c) (2 pt) For what range of values of β will a sensor reading E 1 = 0 increase our belief that X 1 = 0? That is, what is the range of β for which P (X 1 = 0 | E 1 = 0) > P (X 1 = 0)? β ∈ (0. 5 , 1]. Intuitively, observing E 1 = 0 will only increase the belief that X 1 = 0 if E 1 = 0 is more likely under X 1 = 0 than not. Note that β > 0 .5; β = 0.5 is uninformative since the conditional distribution is uniform. This can be verified algebraically by setting p(X 1 = 0) = p(X 1 = 0|E 1 = 0) and solving for β. (d) (2 pt) Unfortunately, the sensor breaks after just one reading, and we receive no further sensor informa- tion. Compute P (X∞ | E 1 = 0), the stationary distribution very many timesteps from now.