










Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
Multiagent models for partially observable environments, focusing on decentralized partially observable markov decision processes (dec-pomdps) and communicative models. The dec-tiger problem, multiagent planning frameworks, partially observable stochastic games, dec-pomdps, interactive pomdps, communication in dec-pomdps, extensive form games, and complexity results. It also mentions some algorithms for solving dec-pomdps.
Typology: Study notes
1 / 18
This page cannot be seen from the preview
Don't miss anything!











Matthijs Spaan
Institute for Systems and Robotics
Instituto Superior T ´
ecnico
Lisbon, Portugal
Reading group meeting, March 26, 2007
Multiagent models for partially observable environments:
Non-communicative models.
Communicative models.
Game-theoretic models.
Some algorithms.
Talk based on survey by Frans Oliehoek (2006).
Aspects:
communication
on-line vs. off-line
centralized vs. distributed
cooperative vs. self-interested
observability
factored reward
Partially observable stochastic games (POSGs) (Hansen et al.,2004):
Extension of stochastic games (Shapley, 1953).
Hence self-interested.
Agents do not observe each other’s observations or actions.
Decentralized partially observable Markov decision processes(Dec-POMDPs) (Bernstein et al., 2002):
Cooperative version of POSGs.
Only one reward, i.e., reward functions are identical for eachagent.
Reward function
1
n
Dec-MDPs:
Jointly observable Dec-POMDP: joint observation
o
o
1
,... , o
n
identifies the state.
But each agents only observes
o
i
MTDP (Pynadath and Tambe, 2002): essentially identical to Dec-POMDP.
Interactive POMDPs (Gmytrasiewicz and Doshi, 2005):
For self-interested agents.
Each agents keeps a belief over world states and otheragents’ models.
An agent’s model: local observation history, policy,observation function.
Leads to infinite hierarchy of beliefs.
Dec-POMDP-Com (Goldman and Zilberstein, 2004)
Dec-POMDP plus:
is the alphabet of all possible messages.
σ
i
is a message sent by agent
i
Σ
is the cost of sending a message.
Reward depends on message sent: R
s, a
1
, σ
1
,... , a
n
, σ
n
, s
′
Instantaneous broadcast communication.
Fixed semantics.
Two policies: for domain-level actions, and forcommunicating.
Closely related model: Com-MTDP (Pynadath and Tambe,2002).
8-card poker:
Observability
Communication
fully
jointly
partial
none
none
general
free, instantaneous
Dynamic programming for POSGs (Hansen et al., 2004).
Uncertainty over state and the other agent’s futureconditional plans.
Define value function
t
over state and other agent’s depth-
t
policy trees: a
vector for each pair of policy trees.
Computing the
t
value function requires backing up all
combinations of all agents’ depth-
t
policy trees.
Prune (very weakly) dominated strategies.
Optimal for cooperative settings (DEC-POMDP).
Still infeasible for all but the smallest problems.
14/
Joint Equilibrium based Search for Policies (Nair et al., 2003)
Use alternating maximization.
Converges to Nash equilibrium, which is a local optimum.
Keeps belief over state and other agents’ observationhistories.
This POMDP is transformed to an MDP over the beliefstates, and solved using value iteration.
Set-Coverage algorithm Becker et al. (2004):
For transition-independent Dec-MDPs with a particular jointreward structure.
Bounded Policy Iteration for Dec-POMDPs (Bernstein et al.,2005):
Optimize a finite-state controller with a bounded size.
Alternating maximization.