Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Game-Theoretic Learning: Regret Minimization vs. Utility Maximization, Study notes of Computer Science

University of Pennsylvania (UPenn)Computer Science

The concepts of regret minimization and utility maximization in game-theoretic learning. The authors, amy greenwald, david gondek, amir jafari, and casey marks, compare these two approaches in the context of game theory, single agent learning, and multiagent learning. They also provide examples of nash equilibrium, correlated equilibrium, and minimax equilibrium in one-shot games.

Typology: Study notes

Pre 2010

Uploaded on 03/28/2010

koofers-user-d8f 🇺🇸

10 documents

1 / 38

This page cannot be seen from the preview

Don't miss anything!

Game-Theoretic Learning:

Regret Minimization vs. Utility Maximization

Amy Greenwald

with David Gondek, Amir Jafari, and Casey Marks

Brown University

University of Pennsylvania

November 17, 2004

Discover Study notes of Computer Science University of Pennsylvania (UPenn)

Partial preview of the text

Download Game-Theoretic Learning: Regret Minimization vs. Utility Maximization and more Study notes Computer Science in PDF only on Docsity!

Game-Theoretic Learning:

Regret Minimization vs. Utility Maximization

Amy Greenwald

with David Gondek, Amir Jafari, and Casey Marks

Brown University

University of Pennsylvania

November 17, 2004

in zero-sum games. No-external-regret learning converges to the set of minimax equilibria,Background

[e.g., Freund and Schapire 1996]

in general-sum games. No-internal-regret learning converges to the set of correlated equilibria,

[e.g., Foster and Vohra 1997]

Outline

◦

Game Theory

Single Agent Learning Model

Multiagent Learning & Game-Theoretic Equilibria

Game Theory: A Crash Course

General-Sum Games ◦

Nash Equilibrium

Correlated Equilibrium

Zero-Sum Games ◦

Minimax Equilibrium

A One-Shot Games

one-shot game

is a 3-tuple Γ = (

I,

A

i , r i ) i∈ I (^) ), where

I

is a set of players

for all players

i ∈

I ,

a set of pure actions

A

i with

a i ∈

A i

a reward function

r i : A → R

, where

A

i∈ I (^) A

i with

a ∈ (^) A

R

A One-Shot Games

one-shot game

is a 3-tuple Γ = (

I,

A

i , r i ) i∈ I (^) ), where

I

is a set of players

for all players

i ∈

I ,

a set of pure actions

A

i with

a i ∈

A i

a reward function

r i : A → R

, where

A

i∈ I (^) A

i with

a ∈ (^) A

The players can employ randomized or

mixed

actions:

for all players

i ∈

I ,

a set of mixed actions

Q i = { q i ∈ R A

|i^ (^) ∑

j (^) q ij

= 1 &

q ij

≥

0 , (^) ∀ j } ,

with

q i ∈

Q i

an expected reward function

r i : Q → R

, where

Q

i ∈ I (^) Q

with

q ∈

Q , s.t.

r i ( q ) =

a ∈ A (^) q

( a ) r i ( a )

Correlated Equilibrium

Chicken

L

R

T

B

L CE

R

T

B

max 12

π T L

π T R

(^) + 9

π BL

π BR

subject to

π T L

(^) π T R

(^) +

(^) π BL

(^) +

(^) π BR

π T L

, π

T R

, π

π L |T

π R |T

π L |T

π R |T

π L |B

π R |B

π L |B

π R |B

π T (^) | L (^) + 2

π B |L

π T (^) | L (^) + 0

π B |L

π T (^) | R (^) + 0

π B |R

π T (^) | R

π B |R

Correlated Equilibrium

Chicken

L

R

T

B

L CE

R

T

B

max 12

π T L

π T R

(^) + 9

π BL

π BR

subject to

π T L

(^) π T R

(^) +

(^) π BL

(^) +

(^) π BR

π T L

, π

T R

, π

π T L

π T R

π T L

(^) + 0

π T R

π BL

(^) + 0

π BR

π BL

(^) + 2

π BR

π T L

π BL

π T L

(^) + 0

π BL

π T R

(^) + 0

π BR

π T R

(^) + 2

π BR

Matching PenniesZero-Sum Games

H

T

H

T

Rock-Paper-Scissors

R

P

S

R

P

S

i ∈ I (^) r i ( a ) = 0, for all

a ∈ (^) A

i ∈ I (^) r

i ( a ) =

c , for all

a ∈

A , for some

c ∈

ExampleMinimax Equilibrium

L

R

T

B

A mixed action profile ( Definition

q 1 ∗ , q (^2) ∗ ) ∈

is a

minimax equilibrium

in a two-player,

zero-sum game iff

◦

r 1 ( q 1 ∗ , q (^2) ∗ ) ≥

r 1 ( j, q

2 ∗ ),

∀ j ∈ (^) A

l 2 ( q (^1) ∗ , q

2 ∗ ) ≤

l 2 ( q (^1) ∗ , k

∀ k ∈

A 2

Φ Mixed TransformationsTransformations

LINEAR

= { φ : Q → Q }

= the set of all row stochastic matrices = the set of all linear transformations

SWAP

= { φ : Q → Q | φ

deterministic

LINEAR

F Pure Transformations

SWAP

= { F : N → N

= the set of all pure transformations

The operation of elements ofIsomorphism

F

SWAP

on

N

the operation of elements of Φ

SWAP

on

Q

F

i )=

∀ k e k φ = e F

k )

Example

If

n

= 4 and

F

, then

Thus,

q

, q

4 〉 φ = 〈 q 4

, q

, for all

q

, q

Q

F Internal Regret Matrices INT

F

(^) ij

∈ F

SWAP

ij

N

, where

F

(^) ij

k

j

if

k

i

k

otherwise

INT

SWAP

ij

N

, where

e

if

k

i

e

otherwise

Example

If

n

= 4, then

Thus,

q

, q

4 〉 φ = 〈 q 1 ,

, q

q

, q

, for all

q

, q

Q

Regret Vector

∈

Observed Regret Vector

˜ρ φ ( r, a

r (^) · (^) aφ

(^) r

· (^) a

Expected Regret Vector

ˆρ φ ( r, q

E

[

ρ φ ( r, a

a ∼

q ]

ρ φ ( r, (^) E [ a | a ∼ (^) q ])

r (^) · (^) qφ

(^) r

· (^) q

No Observed Φ-Regret

lim sup

t→∞

(^1) t ∑

τ t (^) =

˜ρ φ ( r τ , a^

τ )^ ≤

0, for all

∈

Φ, a.s.

No Expected Φ-Regret

lim sup

t →∞

t^1 ∑

τ t (^) =

(^) ˆρ φ ( r τ , q^

τ )^ ≤

0, for all

φ ∈

Game-Theoretic Learning: Regret Minimization vs. Utility Maximization, Study notes of Computer Science

Related documents

Partial preview of the text

Download Game-Theoretic Learning: Regret Minimization vs. Utility Maximization and more Study notes Computer Science in PDF only on Docsity!

Game-Theoretic Learning:

Regret Minimization vs. Utility Maximization

Amy Greenwald

Brown University

University of Pennsylvania

November 17, 2004

I,

A

I

A

A

R

R

I,

A

I

A

A

Q

L

R

T

B

L CE

R

T

B

L

R

T

B

L CE

R

T

B

H

T

H

T

Rock-Paper-Scissors

R

P

S

R

P

S

L

R

T

B

A mixed action profile ( Definition

= { φ : Q → Q }

= the set of all row stochastic matrices = the set of all linear transformations

= { φ : Q → Q | φ

deterministic

F Pure Transformations

= { F : N → N

= the set of all pure transformations

F

on

N

the operation of elements of Φ

on

Q

F

∀ k e k φ = e F

Example

If

n

= 4 and

F

, then

Thus,

q

, q

, q