Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Homework 7: Markov Decision Processes in CS 141, Assignments of Computer Science

Brown University Computer Science

The instructions and objectives for homework 7 in the cs 141 introduction to ai course, focusing on markov decision processes. Students are expected to understand concepts such as expected minimax, game iteration, interpreting discounting, and gambler's ruin, as well as extend algorithms like value and policy iteration to games of chance. The document also includes problems to be solved using various methods.

Typology: Assignments

Pre 2010

Uploaded on 08/19/2009

koofers-user-i9p 🇺🇸

(1)

10 documents

1 / 3

This page cannot be seen from the preview

Don't miss anything!

CS 141 Introduction to AI Greenwald

Homework 7: Markov Decision Processes

Due: 5:00 PM, Apr 20, 2009

Contents

1 Expected Minimax 1

2 Game Iteration 2

3 Interpreting Discounting 2

4 Gambler’s Ruin, Revisited 2

5 Model-Based Learning 3

6 Deterministic Q-Learning 3

Objectives

By the end of this homework, you will understand:

1. why deterministic Q-learning converges

2. how discounting relates to dying

By the end of this homework, you will be able to:

1. avert disaster in a casino

Practice

1 Expected Minimax

The goal of this question is to extend the minimax algorithm to games of chance, like backgammon

and monopoly.

(a) Formally define games of chance by extending the definition of game trees.

(b) Extend the minimax algorithm to take as input a game of chance.

Discover Assignments of Computer Science Brown University

Partial preview of the text

Download Homework 7: Markov Decision Processes in CS 141 and more Assignments Computer Science in PDF only on Docsity!

CS 141 Introduction to AI Greenwald

Homework 7: Markov Decision Processes

Due: 5:00 PM, Apr 20, 2009

1 Expected Minimax 1

2 Game Iteration 2

3 Interpreting Discounting 2

4 Gambler’s Ruin, Revisited 2

5 Model-Based Learning 3

6 Deterministic Q-Learning 3

Objectives

By the end of this homework, you will understand:

why deterministic Q-learning converges
how discounting relates to dying

By the end of this homework, you will be able to:

avert disaster in a casino

Practice

1 Expected Minimax

The goal of this question is to extend the minimax algorithm to games of chance, like backgammon and monopoly.

(a) Formally define games of chance by extending the definition of game trees.

(b) Extend the minimax algorithm to take as input a game of chance.

CS 141 Homework 7: Markov Decision Processes 5:00 PM, Apr 20, 2009

2 Game Iteration

The goal of this question is to extend value and policy iteration to games of chance, like backgammon and monopoly.

(a) Formally define games of chance by extending the definition of MDPs.

(b) Extend the value iteration algorithm to take as input a game of chance.

Problems

3 Interpreting Discounting

(a) Show that for all Markov reward processes, the value function V is well-defined (i.e., finite- valued) if 0 ≤ γ < 1.

(b) Show that there exists a Markov reward process s.t. the value function V is not necessarily well-defined if γ = 1.

(c) Prove that any Markov reward process M 1 , which for discount factor 0 ≤ γ < 1 yields value function V 1 , can be transformed into another Markov reward process M 2 , s.t. the undiscounted value function V 2 = V 1. [Hint: Introduce a zero-reward, absorbing state end in M 2 s.t. all state-action pairs in M 1 transition to end with probability 1 − γ.]

4 Gambler’s Ruin, Revisited

Consider the following controlled version of Gambler’s Ruin in which the gambler places bets on the outcome of a biased coin flip. Assume the gambler’s worth is between 0 and N (i.e., S = { 0 , 1 ,... , N } ∪ {end}). At each state s, the gambler stakes some amount n in the range A = { 1 ,... , min{s, N − s}}. The coin turns up heads with probability p and tails with probability 1 − p. If the coin turns up heads, the gambler wins the amount he stakes (i.e., he transitions to state s + n); otherwise, the gambler loses the amount he stakes (i.e., he transitions to state s − n). A reward of 1 is received upon transitioning to state N ; all other rewards are 0. The gambler transitions from states 0 and N to the absorbing state end, deterministically. The constraints on the gambler’s range of actions ensure that (i) he stakes at least $1 but no more than his worth; (ii) he stakes no more than N − s, preventing his worth from ever exceeding N.

(a) Draw the corresponding MDP, assuming the gambler’s worth is limited to $5 (i.e., N = 5).

(b) What is the optimal policy if p < 0 .5, N = 100, and γ = 1. Why? (Solve this problem by implementing value iteration or policy iteration.)

Homework 7: Markov Decision Processes in CS 141, Assignments of Computer Science

Related documents

Partial preview of the text

Download Homework 7: Markov Decision Processes in CS 141 and more Assignments Computer Science in PDF only on Docsity!

Homework 7: Markov Decision Processes

Due: 5:00 PM, Apr 20, 2009

Contents

Objectives

Practice

1 Expected Minimax

2 Game Iteration

Problems

3 Interpreting Discounting

4 Gambler’s Ruin, Revisited