

Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The instructions and objectives for homework 7 in the cs 141 introduction to ai course, focusing on markov decision processes. Students are expected to understand concepts such as expected minimax, game iteration, interpreting discounting, and gambler's ruin, as well as extend algorithms like value and policy iteration to games of chance. The document also includes problems to be solved using various methods.
Typology: Assignments
1 / 3
This page cannot be seen from the preview
Don't miss anything!


CS 141 Introduction to AI Greenwald
1 Expected Minimax 1
2 Game Iteration 2
3 Interpreting Discounting 2
4 Gambler’s Ruin, Revisited 2
5 Model-Based Learning 3
6 Deterministic Q-Learning 3
By the end of this homework, you will understand:
By the end of this homework, you will be able to:
The goal of this question is to extend the minimax algorithm to games of chance, like backgammon and monopoly.
(a) Formally define games of chance by extending the definition of game trees.
(b) Extend the minimax algorithm to take as input a game of chance.
CS 141 Homework 7: Markov Decision Processes 5:00 PM, Apr 20, 2009
The goal of this question is to extend value and policy iteration to games of chance, like backgammon and monopoly.
(a) Formally define games of chance by extending the definition of MDPs.
(b) Extend the value iteration algorithm to take as input a game of chance.
(c) Extend the policy iteration algorithm to take as input a game of chance.
(a) Show that for all Markov reward processes, the value function V is well-defined (i.e., finite- valued) if 0 ≤ γ < 1.
(b) Show that there exists a Markov reward process s.t. the value function V is not necessarily well-defined if γ = 1.
(c) Prove that any Markov reward process M 1 , which for discount factor 0 ≤ γ < 1 yields value function V 1 , can be transformed into another Markov reward process M 2 , s.t. the undiscounted value function V 2 = V 1. [Hint: Introduce a zero-reward, absorbing state end in M 2 s.t. all state-action pairs in M 1 transition to end with probability 1 − γ.]
Consider the following controlled version of Gambler’s Ruin in which the gambler places bets on the outcome of a biased coin flip. Assume the gambler’s worth is between 0 and N (i.e., S = { 0 , 1 ,... , N } ∪ {end}). At each state s, the gambler stakes some amount n in the range A = { 1 ,... , min{s, N − s}}. The coin turns up heads with probability p and tails with probability 1 − p. If the coin turns up heads, the gambler wins the amount he stakes (i.e., he transitions to state s + n); otherwise, the gambler loses the amount he stakes (i.e., he transitions to state s − n). A reward of 1 is received upon transitioning to state N ; all other rewards are 0. The gambler transitions from states 0 and N to the absorbing state end, deterministically. The constraints on the gambler’s range of actions ensure that (i) he stakes at least $1 but no more than his worth; (ii) he stakes no more than N − s, preventing his worth from ever exceeding N.
(a) Draw the corresponding MDP, assuming the gambler’s worth is limited to $5 (i.e., N = 5).
(b) What is the optimal policy if p < 0 .5, N = 100, and γ = 1. Why? (Solve this problem by implementing value iteration or policy iteration.)
(c) What is the optimal policy if p > 0 .5, N = 100, and γ = 1. Why?