Game-Theoretic Learning: Regret Minimization vs. Utility Maximization, Study notes of Computer Science

The concepts of regret minimization and utility maximization in game-theoretic learning. The authors, amy greenwald, david gondek, amir jafari, and casey marks, compare these two approaches in the context of game theory, single agent learning, and multiagent learning. They also provide examples of nash equilibrium, correlated equilibrium, and minimax equilibrium in one-shot games.

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-d8f
koofers-user-d8f 🇺🇸

10 documents

1 / 38

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Game-Theoretic Learning:
Regret Minimization vs. Utility Maximization
Amy Greenwald
with David Gondek, Amir Jafari, and Casey Marks
Brown University
University of Pennsylvania
November 17, 2004
pf3
pf4
pf5
pf8
pf9
pfa
pfd
pfe
pff
pf12
pf13
pf14
pf15
pf16
pf17
pf18
pf19
pf1a
pf1b
pf1c
pf1d
pf1e
pf1f
pf20
pf21
pf22
pf23
pf24
pf25
pf26

Partial preview of the text

Download Game-Theoretic Learning: Regret Minimization vs. Utility Maximization and more Study notes Computer Science in PDF only on Docsity!

Game-Theoretic Learning:

Regret Minimization vs. Utility Maximization

Amy Greenwald

with David Gondek, Amir Jafari, and Casey Marks

Brown University

University of Pennsylvania

November 17, 2004

in zero-sum games. No-external-regret learning converges to the set of minimax equilibria,Background

[e.g., Freund and Schapire 1996]

in general-sum games. No-internal-regret learning converges to the set of correlated equilibria,

[e.g., Foster and Vohra 1997]

Outline

Game Theory

Single Agent Learning Model

Multiagent Learning & Game-Theoretic Equilibria

Game Theory: A Crash Course

General-Sum Games ◦

Nash Equilibrium

Correlated Equilibrium

Zero-Sum Games ◦

Minimax Equilibrium

A One-Shot Games

one-shot game

is a 3-tuple Γ = (

I,

A

i , r i ) i∈ I (^) ), where

I

is a set of players

for all players

i ∈

I ,

a set of pure actions

A

i with

a i ∈

A i

a reward function

r i : A → R

, where

A

i∈ I (^) A

i with

a ∈ (^) A

R

R

A One-Shot Games

one-shot game

is a 3-tuple Γ = (

I,

A

i , r i ) i∈ I (^) ), where

I

is a set of players

for all players

i ∈

I ,

a set of pure actions

A

i with

a i ∈

A i

a reward function

r i : A → R

, where

A

i∈ I (^) A

i with

a ∈ (^) A

The players can employ randomized or

mixed

actions:

for all players

i ∈

I ,

a set of mixed actions

Q i = { q i ∈ R A

|i^ (^) ∑

j (^) q ij

= 1 &

q ij

0 , (^) ∀ j } ,

with

q i ∈

Q i

an expected reward function

r i : Q → R

, where

Q

i ∈ I (^) Q

i

with

q ∈

Q , s.t.

r i ( q ) =

a ∈ A (^) q

( a ) r i ( a )

Correlated Equilibrium

Chicken

L

R

T

B

L CE

R

T

B

max 12

π T L

π T R

(^) + 9

π BL

π BR

subject to

π T L

(^) π T R

(^) +

(^) π BL

(^) +

(^) π BR

π T L

, π

T R

, π

BL

, π

BR

π L |T

  • 2

π R |T

π L |T

  • 0

π R |T

π L |B

  • 0

π R |B

π L |B

  • 2

π R |B

π T (^) | L (^) + 2

π B |L

π T (^) | L (^) + 0

π B |L

π T (^) | R (^) + 0

π B |R

π T (^) | R

  • 2

π B |R

Correlated Equilibrium

Chicken

L

R

T

B

L CE

R

T

B

max 12

π T L

π T R

(^) + 9

π BL

π BR

subject to

π T L

(^) π T R

(^) +

(^) π BL

(^) +

(^) π BR

π T L

, π

T R

, π

BL

, π

BR

π T L

π T R

π T L

(^) + 0

π T R

π BL

(^) + 0

π BR

π BL

(^) + 2

π BR

π T L

π BL

π T L

(^) + 0

π BL

π T R

(^) + 0

π BR

π T R

(^) + 2

π BR

Matching PenniesZero-Sum Games

H

T

H

T

Rock-Paper-Scissors

R

P

S

R

P

S

i ∈ I (^) r i ( a ) = 0, for all

a ∈ (^) A

i ∈ I (^) r

i ( a ) =

c , for all

a ∈

A , for some

c ∈

R

ExampleMinimax Equilibrium

L

R

T

B

A mixed action profile ( Definition

q 1 ∗ , q (^2) ∗ ) ∈

Q

is a

minimax equilibrium

in a two-player,

zero-sum game iff

r 1 ( q 1 ∗ , q (^2) ∗ ) ≥

r 1 ( j, q

2 ∗ ),

∀ j ∈ (^) A

1

l 2 ( q (^1) ∗ , q

2 ∗ ) ≤

l 2 ( q (^1) ∗ , k

),

∀ k ∈

A 2

Φ Mixed TransformationsTransformations

LINEAR

= { φ : Q → Q }

= the set of all row stochastic matrices = the set of all linear transformations

SWAP

= { φ : Q → Q | φ

deterministic

LINEAR

F Pure Transformations

SWAP

= { F : N → N

= the set of all pure transformations

The operation of elements ofIsomorphism

F

SWAP

on

N

the operation of elements of Φ

SWAP

on

Q

ij

F

i )=

j

∀ k e k φ = e F

k )

Example

If

n

= 4 and

F

, then

Thus,

q

, q

, q

, q

4 〉 φ = 〈 q 4

, q

, q

, q

, for all

q

, q

, q

, q

Q

16

F Internal Regret Matrices INT

F

(^) ij

∈ F

SWAP

ij

N

, where

F

(^) ij

k

j

if

k

i

k

otherwise

INT

ij

SWAP

ij

N

, where

e

k

ij

e

j

if

k

i

e

k

otherwise

Example

If

n

= 4, then

Thus,

q

, q

, q

, q

4 〉 φ = 〈 q 1 ,

, q

q

, q

, for all

q

, q

, q

, q

Q

18

Regret Vector

ρ

R

Observed Regret Vector

˜ρ φ ( r, a

r (^) · (^) aφ

(^) r

· (^) a

Expected Regret Vector

ˆρ φ ( r, q

E

[

ρ φ ( r, a

a ∼

q ]

ρ φ ( r, (^) E [ a | a ∼ (^) q ])

r (^) · (^) qφ

(^) r

· (^) q

No Observed Φ-Regret

lim sup

t→∞

(^1) t ∑

τ t (^) =

˜ρ φ ( r τ , a^

τ )^ ≤

0, for all

φ

Φ, a.s.

No Expected Φ-Regret

lim sup

t →∞

t^1 ∑

τ t (^) =

(^) ˆρ φ ( r τ , q^

τ )^ ≤

0, for all

φ ∈

Φ