Markov Decision Process , Lecture Notes - Computer Science | Study notes Computer Numerical Control

CS181 Lecture 16 — Markov Decision Processes

Avi Pfeffer; Revised by David Parkes

March 30, 2011

We now turn to a new component of the course in which agents take actions and become active in their

environments. Up until this point agents have been passive learners: hanging out and analyzing data.

We are heading towards the interesting concept of reinforcement learning (RL), in which an agent learns

to act in an uncertain environment by training data that is sequences of state, action, reward, state, action,

reward, ... The fun part of RL is that the training data depends on the actions of the agent, and this will

lead us to a discussion of exploitation vs. exploration.

But for now we will consider planning problems in which the agent is given a probabilistic model of its

environment and should decide which actions to take. We begin with a brief introduction to utility theory

and the maximum expected utility principle (MEU). This provides the foundation for what it means to be

a rational agent: we will take a rational agent as one that tries to maximize its expected utility, given its

uncertainty about the world.

Our focus in Markov Decision Processes will be on the more realistic situation of repeated interactions.

The agent gets some information about the world, chooses an action, receives a reward, gets some more

information, chooses another action, receives another reward, and so on. This type of agent needs to consider

not only its immediate reward when making its decision, but also its utility in the long run. Optional readings:

Russell & Norivg 16, 17-17.3

1 Decision Theoretic Framework

Here is a capsule summary of the decision theoretic framework:

•We use probability to model uncertainty about the domain.

•We use utility to model an agent’s objectives.

•The goal is to design a decision policy, describing how the agent will act in all possible states, for the

agent that maximizes its expected utility.

Decision theory can be viewed as a definition of rationality. The maximum expected utility principle

states that a rational agent is one that chooses its policy so as to maximize its expected utility. The decision

theoretic framework provides a precise and concrete formulation of the problem we wish to solve and a

method for designing an intelligent agent.

1.1 Problem Formulation

There are various sources of uncertainty that an agent faces when trying to achieve good performance in

complex environments:

•The agent may not know the current state of the world. There can be a number of reasons for this:

–Its sensors only give it partial information about the state of the world. For example, an agent

using a video camera cannot see through walls.

Markov Decision Process , Lecture Notes - Computer Science, Study notes of Computer Numerical Control

Related documents

Partial preview of the text

Download Markov Decision Process , Lecture Notes - Computer Science and more Study notes Computer Numerical Control in PDF only on Docsity!

CS181 Lecture 16 — Markov Decision Processes

Avi Pfeffer; Revised by David Parkes

March 30, 2011

1 Decision Theoretic Framework

1.1 Problem Formulation

∑^ K

1.2 What are Utilities?

S

A

R

S 0 S 1 S 2...

A 0

R 0

A 1

R 1

3.2 Robot Navigation