








Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Material Type: Exam; Class: Introduction to Artificial Intelligence; Subject: Computer Science; University: University of California - Berkeley; Term: Spring 2006;
Typology: Exams
1 / 14
This page cannot be seen from the preview
Don't miss anything!









CS 188 Introduction to
Spring 2006 Artificial Intelligence Practice Final Sol’ns
(a) True/False: All MDPs can be solved using expectimax search.
False. MDPs with self loops lead to infinite expectimax trees. Unlike search problems, this issue cannot be addressed with a graph-search variant. (b) True/False: There is some single Bayes’ net structure over three variables which can represent any prob- ability distribution over those variables.
True. A fully connected Bayes’ net can represent any joint distribution. (c) True/False: Any rational agent’s preferences over outcomes can be summarized by a single real valued utility function over those outcomes.
True. Any set of preferences which conform to the six constraints on rational preferences (orderability, transitivity, continuity, substitutability, monotonicity, decomposability) can be summarized by a single, real-valued function. (d) True/False: Temporal difference learning of optimal utility values (U) requires knowledge of the transition probability tables (T).
Mostly True. Temporal difference learning is a model-less learning technique that requires only example state sequences to learn the utilities for a fixed policy. However, to derive the best policy from those utilities, which would be required to find the optimal utility values, we would need to compute
π(s) = arg maxa
s′
T (s, a, s′)U (s′)
which of course includes a transition probability. The solution reads “mostly” true because the optimal utility values could be found without the transition probabilities if the agent were also supplied with the optimal policy. In practice, we could also estimate the transition probabilities from the training data (using maximum-likelihood estimates, for example), so they need not necessarily be known in advance. NOTE: This solution was updated since the review session. (e) True/False: Pruning nodes from a decision tree may have no effect on the resulting classifier.
True. Trivially, a decision tree may have branches that are unreachable. Furthermore, splits in the decision tree may also refine P (class), but have no effect in practice because of rounding. Imagine a leaf has 10 true, 3 false, and splits to 5/2 and 5/1 – you’ll still guess true on each branch, but the split is refining the conditional probabilities. NOTE: This solution was updated since the review session.
Consider the following search problem formulation:
States: 16 integer coordinates, (x, y) ∈ [1, 4] × [1, 4] Initial state: (1, 1) Successor function: The successor function generates 2 states with different y-coordinates Goal test: (4, 4) is the only goal state Step cost: The cost of going from one state to another is the Euclidean distance between the points
We can specify a state space by drawing a graph with directed edges from each state to its successors:
1 2 3 4
1
2
3
4
x
y
Uninformed Search Consider the performance of DFS, BFS, and UCS on the state space above. Order successors so that DFS or BFS explores the state with lower y-coordinate first.
a) What uninformed search algorithm(s) find an optimal solution? What is this path cost? Solution: UCS returns the lowest-cost path ((1, 1), (2, 2), (3, 3), (4, 4)), of cost 3
b) What uninformed search algorithm(s) find a shortest solution? How long is this path? Solution: BFS returns the path ((1, 1), (2, 3), (4, 4)), of length 2.
c) What uninformed search algorithm(s) are most efficient? How many search nodes are expanded? Solution: DFS expands 3 nodes corresponding to states (1, 1), (2, 2), (3, 2). The goal node is not expanded.
Heuristic Search Use the Euclidean distance to the goal as a heuristic for A∗^ and greedy best-first search:
d) What heuristic search algorithm(s) find an optimal solution? Solution: A∗^ search is optimal given an admissible heuristic such as Euclidean distance.
e) What heuristic search algorithm(s) find a shortest solution? Solution: Greedy best-first search returns the path ((1, 1), (2, 3), (4, 4)), of length 2.
f ) What heuristic search algorithm(s) are most efficient? How many search nodes are expanded? Solution: Greedy best-first search expands 2 nodes corresponding to states (1, 1), (2, 3).
[From a previous year’s exam.] Consider the problem of tiling a surface (completely and exactly covering it) with n dominoes (each is a 2 × 1 rectangle, and the number of pips is irrelevant). The surface is an arbitrary edge-connected (i.e., adjacent along an edge, not just a corner) collection of 2n 1 × 1 squares (e.g., a checkerboard, a checkerboard with some squares missing, a 10 × 1 row of squares, etc.).
a) Formulate this problem precisely as a CSP where the dominoes are the variables (i.e., define the variable domain and the constraints).
Variables: X 1 ,... , Xn Domains: The value of a domino variable is the pair of adjacent squares that it covers. ∀ i ∈ { 1 ,... , n} : Di = {(s, s′) : square s is adjacent to square s′} Constraints: No two dominoes may cover the same square. ∀ (i, j) ∈ { 1 ,... , n}^2 : Cij =
(si, s′ i), (sj , s′ j )
∣{si, s′ i} ∪ {sj , s′ j }
with i 6 = j
NOTE: This solution was updated since the review session.
b) Formulate this problem precisely as a CSP where the squares are the variables (i.e., define the variable domain and the constraints). [Hint: it doesn’t matter which particular domino covers a given pair of squares.]
Variables: X 1 ,... , X 2 n Domains: The value of a square is the adjacent square which is covered by the same domino. ∀ i ∈ { 1 ,... , 2 n} : Di = {s′ i : square si is adjacent to square s′ i} Constraints: Adjacent squares must agree: square Xi = sj if and only if Xj = si.
c) For your formulation in part (b), describe exactly the set of instances that have a tree-structured constraint graph.
Solution: The constraints in (b) are binary constraints relating adjacent squares, so the constraint graph will form a loop whenever the squares form a loop (e.g., any 2 × 2 block). So any problem in which the “width” of the surface is 1 and no loops will have a tree-structured constraint graph (assuming it’s one connected component). This property could easily be verified for a graph by performing a depth-first search from any node. If a node is reached exactly once in this manner (assuming a node’s parent is not one of its successors), then the constraint graph will be a tree.
d) [Extra credit] Consider domino-tiling a checkerboard in which two opposite corners have been removed (31 dominoes to tile 62 squares). Prove that the CSP has no consistent assignment.
Solution: Consider the alternating colors of checkerboard squares: no two adjacent tiles have the same color, thus a domino will always cover two tiles of different colors. If we remove opposite corners of the board, there are 30 squares of one color and 32 squares of another color. Hence, it is not possible for 31 dominoes to each cover one square of each color.
Solution:P (heads) = 1 gives a likelihood of 1 to the observation.
Assume that there are three kinds of coins in the world: fair coins, two-headed coins which always come up heads, and two-tailed coins which always come up tails. Moreover, assume that you have a prior belief or knowledge that a given coin is fair with probability 0.5, and two-headed or two-tailed with probability 0. each. b) Draw a Bayes’ net which expresses the assumptions that a single coin is chosen, then the flip outcomes are indentically distributed and conditionally independent given the coin type. Your network should include a node for the unobserved fourth flip.
Solution: The Bayes’ net has the structure of a naive Bayes model:
c) What is the posterior probability that the coin is fair given the three heads observations?
Solution: The posterior Pc|f = P (Coin = F air|F lip1 = H, F lip2 = H, F lip3 = H) is calculated using Bayes’ rule.
Pc|f = P (Coin = F air|F lip1 = H, F lip2 = H, F lip3 = H)
=
P (Coin = F air)P (F lip1 = H, F lip2 = H, F lip3 = H|Coin = F air) P (F lip1 = H, F lip2 = H, F lip3 = H)
=
P (Coin = F air)P (F lip1 = H, F lip2 = H, F lip3 = H|Coin = F air) ∑ c P^ (Coin^ =^ c)P^ (F lip1 =^ H, F lip2 =^ H, F lip3 =^ H|Coin^ =^ c)
=
(a) Assuming that leaf values are finite but unbounded, is pruning (as in alpha-beta) ever possible in a max tree? Give an example, or explain why not.
Solution: No pruning is possible. The next branch of the tree may always contain a higher value than any seen before.
(b) Is pruning ever possible in an expectimax tree under the same conditions? Give an example, or explain why not.
Solution: No pruning is possible. Again, the next branch of the tree may always contain a higher value than any seen before at a max node. The expectation node must be computed in its entirety as well for the max node above it.
(c) If leaf values are constrained to be nonnegative, is pruning ever possible in a max tree? Give an example, or explain why not.
Solution: No pruning is possible. Establishing a lower bound does not address the issue in (a).
(d) If leaf values are constrained to be nonnegative, is pruning ever possible in an expectimax tree? Give an example, or explain why not.
Solution: No pruning is possible. Establishing a lower bound does not address the issue in (b).
(e) If leaf values are constrained to be in the range [0, 1], is pruning ever possible in a max tree? Give an example, or explain why not. Solution: The second leaf can be pruned in:
(f ) If leaf values are constrained to be in the range [0, 1], is pruning ever possible in an expectimax tree? Give an example (qualitatively different from your example in (e), if any), or explain why not. Solution: Assuming a left-to-right ordering, the right-most leaf can be pruned in:
(g) Consider the outcomes of a chance node in an expectimax tree. Which of the following orders is most likely to yield pruning opportunities?
Solution: Highest probability first.
Filling in the numbers, we have:
P (berry|round) =
(g) If you tried to model the problem with network (2), can you give P (W rapper = Red|Shape = Round)? If so, give the answer. Solution: The network in (2) can represent any probability distribution over the three variables. However, probabilities given in the question underspecify this joint probability distribution, so we don’t have enough information to fill in the CPTs for (2).
(h) If you tried to model the problem with network (4), can you give P (W rapper = Red|Shape = Round)? If so, give the answer. Solution: The network in (4) specifies only one distribution that is consistent with the probabilities in the problem description. Finding this solution is quite tricky. First, we observe that the network in (4) is a Markov chain. Recalling the forward algorithm we can see that**
P (W rapper|F lavor) =
shape
P (shape|F lavor)P (W rapper|shape)
This computation amounts to summing the probabilities of all the paths that get from F lavor to W rapper. Let’s now specialize this equation for red wrappers:
P (red|berry) = P (round|berry)P (red|round) + P (square|berry)P (red|square) P (red|anchovy) = P (round|anchovy)P (red|round) + P (square|anchovy)P (red|square)
We can fill in all but two of these quantities given the data provided, which leaves us with a system of two linear equations with two unknowns.
Solving these equations simultaneously, we have P (red|round) = 0. 9 , P (red|square) = 0.1. ** To see why this Markov chain property is true from first principles, observe that
P (W rapper|F lavor) =
P (W rapper, F lavor) P (F lavor)
=
shape P^ (W rapper, Shape, F lavor) P (F lavor)
=
shape P^ (F lavor)P^ (Shape|F lavor)P^ (W rapper|Shape) P (F lavor) =
shape
P (Shape|F lavor)P (W rapper|Shape)
This question proved trickier than we expected. Nothing this involved is going to appear on the actual exam. NOTE: This solution was updated since the review session.
Robot Localization Grid Suppose you are a robot navigating a maze (see figure 1), where some of the cells are free and some are blocked. At each time step, you are occupying one of the free cells. You are equipped with sensors which give you noisy observations, (wU , wD , wL, wR) of the four cells adjacent to your current position (up,down,left, and right respectively). Each wi is either free or blocked, and is accurate 80% of the time, independently of the other sensors or your current position. 1 Imagine that you have experienced a motor malfunction that causes you to randomly move to one of the four adjacent cell with probability 14.^2
a) Suppose you start in the central cell in figure 1. One time step passes and you are now in a new, possibly different state and your sensors indicate (free,blocked,blocked,blocked ). Which states have a non-zero probability of being your new position? Solution:(2, 1), (2, 2), (2, 3)
b) Give the posterior probability distribution over your new position. Solution: The posterior probabilities of each state are proportional to the probability of ending up in the state after one random move weighted by the probability of the observation from that state:
P (s 1 = (2, 1)|wU , wD , wL, wR) ∝
P (s 1 = (2, 2)|wU , wD , wL, wR) ∝
P (s 1 = (2, 3)|wU , wD , wL, wR) ∝
Normalizing, we have
P (s 1 = (2, 1)|wU , wD , wL, wR) = 0. 02 P (s 1 = (2, 2)|wU , wD , wL, wR) = 0. 65 P (s 1 = (2, 3)|wU , wD , wL, wR) = 0. 33
This distribution matches our intuition: a move upward was quite unlikely because of the observation, which disagrees with the layout around (2, 1) in three out of four directions.
c) Suppose that s 0 is your starting state and that s 1 and s 2 are random variables indicating your state after the first and second time steps. Draw a Bayes’ net illustrating the relationships between each si and the sensor observations associated with that state. Consider the CPT for s 1. How many values for s 1 have non-zero probability? (^1) Assume that if a cell is off the given grid it is treated as blocked. (^2) If you move towards a blocked cell, you hit the wall and stay where you are.
a) Assuming 0 ≤ Rwait ≤ Rclean, which of the following can be optimal behavior (circle all that apply).
(a) Always clean, regardless of battery. Solution: If Rclean is large enough, this will be the optimal policy. (b) Always recharge, regarless of battery. Solution: If Rclean = Rwait = 0, this will be the optimal policy. (c) Clean with high battery, recharge with low battery. Solution: For some reasonably high rewards, this is the policy we’d expect. (d) Recharge with high battery, clean with low battery. Solution: This cannot be the optimal policy because if cleaning with low battery is optimal, then cleaning with high battery will be optimal.
b) Write a Bellman equation relating the optimal utility of the state low in terms of other optimal utilities. You should use specific variables and probabilities from the problem statement above when possible. Solution:With the reward structure for state-action-state triples, we most naturally consider the optimal utility to be the discounted sum of all future rewards.
U ∗(low) = max
γU ∗(high), Rwait + γU ∗(low),
(Rclean + γU ∗(low)) +
(−10 + γU ∗(high))
where γ = 0.9, as mentioned in the problem description.
c) Let Rclean = 3 and Rwait = 1. Assume you have an initial estimate of zero for each state’s utility. What are the estimates after one round of value iteration? Solution:After 1 round of value iteration, the states take their maximal expected rewards after one action since future rewards are 0. Thus, we have
U (high) = 3, U (low) = 1
d) Using the rewards above and a learning rate of 0.2, consider using Q-learning to estimate q-values for this problem. Assume all of your q-value estimates are zero initially. Imagine you observe the following sequence of states, actions, and rewards: high, clean, +3, high, clean, +3, low, clean +3, low, clean -10. Show the q-values for each state after each reward. Solution: The Q update for this problem is precisely what we saw for the robot project. We need only perform those updates to solve the problem: After high, clean, +3, we have Q(high, clean) = .6 and all other Q are 0.
After another high, clean, +3, we have Q(high, clean) =. 8 · .6 + .2(3 + 0. 9 · 0) = 1.08 because the next state is low, and all other Q are 0. After low, clean +3, we have Q(high, clean) = 1. 08 , Q(low, clean) = 0.6 and all other Q are zero. After low, clean -10, we have Q(low, clean) = 0. 8 · 0 .6 + 0.2(−10 + 0. 9 · 1 .08) = − 1 .33. We can make the Q update because we know that the following state will be high. Note that the agent would not be able to make the update until it identified its next state to be high and therefore could compute maxa′^ Q(high, a′) = 1.08. After this final update, we would still have Q(high, clean) = 1.08 and all other Q equal to 0.