






























Study with the several resources on Docsity
Earn points by helping other students or get them with a premium plan
Prepare for your exams
Study with the several resources on Docsity
Earn points to download
Earn points by helping other students or get them with a premium plan
The concepts of regret minimization and utility maximization in game-theoretic learning. The authors, amy greenwald, david gondek, amir jafari, and casey marks, compare these two approaches in the context of game theory, single agent learning, and multiagent learning. They also provide examples of nash equilibrium, correlated equilibrium, and minimax equilibrium in one-shot games.
Typology: Study notes
1 / 38
This page cannot be seen from the preview
Don't miss anything!































with David Gondek, Amir Jafari, and Casey Marks
in zero-sum games. No-external-regret learning converges to the set of minimax equilibria,Background
[e.g., Freund and Schapire 1996]
in general-sum games. No-internal-regret learning converges to the set of correlated equilibria,
[e.g., Foster and Vohra 1997]
Outline
◦
Game Theory
Single Agent Learning Model
Multiagent Learning & Game-Theoretic Equilibria
Game Theory: A Crash Course
General-Sum Games ◦
Nash Equilibrium
Correlated Equilibrium
Zero-Sum Games ◦
Minimax Equilibrium
A One-Shot Games
one-shot game
is a 3-tuple Γ = (
i , r i ) i∈ I (^) ), where
is a set of players
for all players
i ∈
I ,
a set of pure actions
i with
a i ∈
A i
a reward function
r i : A → R
, where
i∈ I (^) A
i with
a ∈ (^) A
A One-Shot Games
one-shot game
is a 3-tuple Γ = (
i , r i ) i∈ I (^) ), where
is a set of players
for all players
i ∈
I ,
a set of pure actions
i with
a i ∈
A i
a reward function
r i : A → R
, where
i∈ I (^) A
i with
a ∈ (^) A
The players can employ randomized or
mixed
actions:
for all players
i ∈
I ,
a set of mixed actions
Q i = { q i ∈ R A
|i^ (^) ∑
j (^) q ij
= 1 &
q ij
≥
0 , (^) ∀ j } ,
with
q i ∈
Q i
an expected reward function
r i : Q → R
, where
i ∈ I (^) Q
i
with
q ∈
Q , s.t.
r i ( q ) =
a ∈ A (^) q
( a ) r i ( a )
Correlated Equilibrium
Chicken
max 12
π T L
π T R
(^) + 9
π BL
π BR
subject to
π T L
(^) π T R
(^) +
(^) π BL
(^) +
(^) π BR
π T L
, π
T R
, π
BL
, π
BR
π L |T
π R |T
π L |T
π R |T
π L |B
π R |B
π L |B
π R |B
π T (^) | L (^) + 2
π B |L
π T (^) | L (^) + 0
π B |L
π T (^) | R (^) + 0
π B |R
π T (^) | R
π B |R
Correlated Equilibrium
Chicken
max 12
π T L
π T R
(^) + 9
π BL
π BR
subject to
π T L
(^) π T R
(^) +
(^) π BL
(^) +
(^) π BR
π T L
, π
T R
, π
BL
, π
BR
π T L
π T R
π T L
(^) + 0
π T R
π BL
(^) + 0
π BR
π BL
(^) + 2
π BR
π T L
π BL
π T L
(^) + 0
π BL
π T R
(^) + 0
π BR
π T R
(^) + 2
π BR
Matching PenniesZero-Sum Games
i ∈ I (^) r i ( a ) = 0, for all
a ∈ (^) A
i ∈ I (^) r
i ( a ) =
c , for all
a ∈
A , for some
c ∈
R
ExampleMinimax Equilibrium
q 1 ∗ , q (^2) ∗ ) ∈
Q
is a
minimax equilibrium
in a two-player,
zero-sum game iff
◦
r 1 ( q 1 ∗ , q (^2) ∗ ) ≥
r 1 ( j, q
2 ∗ ),
∀ j ∈ (^) A
1
l 2 ( q (^1) ∗ , q
2 ∗ ) ≤
l 2 ( q (^1) ∗ , k
),
∀ k ∈
A 2
Φ Mixed TransformationsTransformations
LINEAR
SWAP
LINEAR
SWAP
The operation of elements ofIsomorphism
SWAP
SWAP
ij
i )=
j
k )
16
F Internal Regret Matrices INT
(^) ij
SWAP
(^) ij
INT
ij
SWAP
k
ij
j
k
18
Regret Vector
ρ
∈
R
Observed Regret Vector
˜ρ φ ( r, a
r (^) · (^) aφ
(^) r
· (^) a
Expected Regret Vector
ˆρ φ ( r, q
ρ φ ( r, a
a ∼
q ]
ρ φ ( r, (^) E [ a | a ∼ (^) q ])
r (^) · (^) qφ
(^) r
· (^) q
No Observed Φ-Regret
lim sup
t→∞
(^1) t ∑
τ t (^) =
˜ρ φ ( r τ , a^
τ )^ ≤
0, for all
φ
∈
Φ, a.s.
No Expected Φ-Regret
lim sup
t →∞
t^1 ∑
τ t (^) =
(^) ˆρ φ ( r τ , q^
τ )^ ≤
0, for all
φ ∈
Φ