Docsity
Docsity

Prepare-se para as provas
Prepare-se para as provas

Estude fácil! Tem muito documento disponível na Docsity


Ganhe pontos para baixar
Ganhe pontos para baixar

Ganhe pontos ajudando outros esrudantes ou compre um plano Premium


Guias e Dicas
Guias e Dicas


LISTA DE EXERCICIOS CONTROLE OTIMO, Exercícios de Engenharia Elétrica

LISTA DE EXERCICIOS CONTROLE OTIMO

Tipologia: Exercícios

2020

Compartilhado em 26/07/2020

felipe-gomes-barbosa-1
felipe-gomes-barbosa-1 🇧🇷

5

(3)

5 documentos

1 / 14

Toggle sidebar

Esta página não é visível na pré-visualização

Não perca as partes importantes!

bg1
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe

Pré-visualização parcial do texto

Baixe LISTA DE EXERCICIOS CONTROLE OTIMO e outras Exercícios em PDF para Engenharia Elétrica, somente na Docsity!

Section 12.2 Markov Decision Process 629 12.2 MARKOV DECISION PROCESS Consider an agent or decision maker that interacts with its environment in the manner illustrated in Fig. 12.1. The agent operates in accordance with a finite-discrete-time Markovian decision process that is characterized as follows: e The environment evolves probabilistically, occupying a finite set of discrete states. However, the state does not contain past statistics, even though these statistics could be useful to the agent. e For each environmental stage, there is a finite set of possible actions that may be taken by the agent. e Every time the agent takes an action, a certain cost is incurred. e States are observed, actions are taken, and costs are incurred at discrete times. In the context of our present discussion, we introduce the following definition: The state of the environment is a summary of the entire past experience of an agent gained from its interaction with the environment, such that the information necessary for the agent to predict the future behavior of the environment is contained in that summary. The state at time-step n is denoted by the random variable X,, and the actual state at time-step n is denoted by i,. The finite set of states is denoted by %. A surprising aspect of dynamic programming is that its applicability depends very little on the nature of the state. We may therefore proceed without any assumption on the structure of the state space. Note also that the complexity of the dynamic-programming algorithm is quadratic in the dimension of the state space and linear in the dimension of the action space. For state i, for example, the available set of actions (i.e., inputs applied to the envi- ronment by the agent) is denoted by 4; = (a; ), where the second subscript k in action a; taken by the agent merely indicates the availability of more than one possible action when the environment is in state i. The transition of the environment from the state i to the new state j, for example, due to action a; is probabilistic in nature. Most importantly, however, the transition probability from state i to state j depends entirely on the current state i and the corresponding action a;.. This is the Markov property, which was discussed in Chapter 11. This property is crucial because it means that the current state of the environment provides the necessary information for the agent to decide what action to take. The random variable denoting the action taken by the agent at time-step n is denoted by 4, Let p;(a) denote the transition probability from state i to state j due to State FIGURE 12.1 Block diagram of an Agent agent interacting with its environment. ficas Environment - Action