Matrix Games and Nash Equilibrium: Understanding the Strategic Choices in Game Theory, Study notes of Computer Science

The concept of nash equilibrium in the context of matrix games, using examples like the prisoners' dilemma, the battle of the sexes, and hawks and doves. It delves into the definition of mixed strategies and the existence of nash equilibria in these games.

Typology: Study notes

Pre 2010

Uploaded on 08/16/2009

koofers-user-dh4
koofers-user-dh4 🇺🇸

10 documents

1 / 9

Toggle sidebar

This page cannot be seen from the preview

Don't miss anything!

bg1
Game-Theoretic Artificial Intelligence1Spring 2007
Professor Amy Greenwald Topic #2
Matrix Games and Nash Equilibrium
This lecture is concerned with the Nobel Prize winning work of John Nash.
In particular, we define the notion of mixed strategies in matrix games, and we
present Nash’s argument on the existence of mixed strategy (Nash) equilibrium.
1 Examples of Games
The most well-known game-theoretic scenario is the paradoxical situation known
as the Prisoners’ Dilemma, which was popularized by Axelrod [1] in his popular
science book. The following is one (uncommon) variant of the story.2
A crime has been committed for which two prisoners are held incommunicado.
The district attorney is assigned to question the prisoners. He designs the
following incentive structure to induce the prisoners to talk. If neither prisoner
talks, both prisoners automatically receive mild sentences (payoff 4). But if
exactly one prisoner squeals on the other, the squealer is let off scot free (payoff
5), while the “squealee” is subject to a severe sentence (payoff 0). Finally, if
both prisoners squeal, they share the severe punishment (payoff 1).
The Prisoners’ Dilemma is a two player a matrix—or strategic, or normal form—
game. Such games are easily described by payoff matrices, where the strategies
of player 1 and player 2 serve as row and column labels, respectively, and the
corresponding payoffs are listed as pairs in matrix cells such that the first (sec-
ond) number is the payoff to player 1 (2). The payoff matrix which describes
the Prisoners’ Dilemma is depicted in Figure 1, with Cdenoting “cooperate”
or “confess”, and Ddenoting “defect” or “don’t cooperate.”
This game is known as the Prisoners’ Dilemma because the only rational out-
come is (D, D), which yields suboptimal payoffs of (1,1). The reasoning is as
follows. If player 1 plays C, then player 2 is better off playing D, since Dyields
a payoff of 5, whereas Cyields only 4; but if player 1 plays D, then player 2 is
again better off playing D, since Dyields a payoff of 1, whereas Cyields only
0. Hence, regardless of the strategy of player 1, a rational player 2 plays D.
By a symmetric argument, a rational player 1 also plays D. Thus, the unique
outcome of the game, assuming the players are rational, is (D , D).
1Copyright c
Amy Greenwald, 2007
2The original anecdote due to A.W. Tucker appears in Rapoport [5]; the latter author is
the two-time winner of the Prisoners’ Dilemma computer tournament organized by Axelrod.
1
pf3
pf4
pf5
pf8
pf9

Partial preview of the text

Download Matrix Games and Nash Equilibrium: Understanding the Strategic Choices in Game Theory and more Study notes Computer Science in PDF only on Docsity!

Game-Theoretic Artificial Intelligence^1 Spring 2007 Professor Amy Greenwald Topic #

Matrix Games and Nash Equilibrium

This lecture is concerned with the Nobel Prize winning work of John Nash. In particular, we define the notion of mixed strategies in matrix games, and we present Nash’s argument on the existence of mixed strategy (Nash) equilibrium.

1 Examples of Games

The most well-known game-theoretic scenario is the paradoxical situation known as the Prisoners’ Dilemma, which was popularized by Axelrod [1] in his popular science book. The following is one (uncommon) variant of the story.^2

A crime has been committed for which two prisoners are held incommunicado. The district attorney is assigned to question the prisoners. He designs the following incentive structure to induce the prisoners to talk. If neither prisoner talks, both prisoners automatically receive mild sentences (payoff 4). But if exactly one prisoner squeals on the other, the squealer is let off scot free (payoff 5), while the “squealee” is subject to a severe sentence (payoff 0). Finally, if both prisoners squeal, they share the severe punishment (payoff 1).

The Prisoners’ Dilemma is a two player a matrix—or strategic, or normal form— game. Such games are easily described by payoff matrices, where the strategies of player 1 and player 2 serve as row and column labels, respectively, and the corresponding payoffs are listed as pairs in matrix cells such that the first (sec- ond) number is the payoff to player 1 (2). The payoff matrix which describes the Prisoners’ Dilemma is depicted in Figure 1, with C denoting “cooperate” or “confess”, and D denoting “defect” or “don’t cooperate.”

This game is known as the Prisoners’ Dilemma because the only rational out- come is (D, D), which yields suboptimal payoffs of (1, 1). The reasoning is as follows. If player 1 plays C, then player 2 is better off playing D, since D yields a payoff of 5, whereas C yields only 4; but if player 1 plays D, then player 2 is again better off playing D, since D yields a payoff of 1, whereas C yields only

  1. Hence, regardless of the strategy of player 1, a rational player 2 plays D. By a symmetric argument, a rational player 1 also plays D. Thus, the unique outcome of the game, assuming the players are rational, is (D, D).

(^1) Copyright c© Amy Greenwald, 2007 (^2) The original anecdote due to A.W. Tucker appears in Rapoport [5]; the latter author is the two-time winner of the Prisoners’ Dilemma computer tournament organized by Axelrod.

C

D

1 2 C^ D

Figure 1: The Prisoners’ Dilemma

Another popular two-player game is called the Battle of the Sexes. A man and a woman would like to spend an evening out together; however, the man prefers to go to a football game (strategy F ), while the woman prefers to go to the ballet (strategy B). Both the man and the woman prefer to be together, even at the event that is not to their liking, rather than go out alone. The payoffs of this coordination game are shown in Figure 2; the woman is player 1 and the man is player 2. In this game, there are two coordination equilibria, one which is preferred by the woman, and another which is preferred by the man.

1

2 B

B 2,1 0,

F 0,0 1,

F

Figure 2: Battle of the Sexes

The stag hunt game (see Figure 3) is a prototypical social contract. Rousseau tells an early version of the story in A Discourse on Inequality.:

If it was a matter of hunting a deer, everyone realized that he must remain well faithful to his post; but if a hare happened to pass within reach of one of them, we cannot doubt that he would have gone off in pursuit of it without scruple...

This game has two equilibria, one of which Pareto dominates the other: i.e., all players are simultaneously better off at one equilibrium than the other. But action H risk-dominates action D, since action H is safer for each player, given his/her uncertainty about the other player’s action.

2 Nash Equilibrium

A Nash equilibrium is a strategy profile from which none of the players has any incentive to deviate. In particular, no player can achieve strictly greater payoffs by choosing any strategy other than the one prescribed by the profile, given that all other players choose their prescribed strategies. In this sense, a Nash equilibrium specifies optimal strategic choices for all players.

Let us examine the Nash equilibria in the aforementioned examples:

  • In the Prisoners’ Dilemma, (D, D) is a Nash equilibrium: given that player 1 plays D, the best response of player 2 is to play D; given that player 2 plays D, the best response of player 1 is to play D.
  • The Battle of the Sexes has two pure strategy Nash equilibria, namely (B, B), and (F, F ): If the woman plays B, then the best response of the man is B; if the man plays B, then the best response of the woman is B. Analogously, if the woman plays F , the best response of the man is F ; if the man plays F , the best response of the woman is F.
  • The game of Hawks and Doves also has two pure strategy Nash equilibria, only this time, the players seek to miscoordinate their actions, rather than coordinate. If one player plays the hawk, then the other player prefers to play the dove, achieving payoffs of 0, rather than (v − c)/ 2 < 0. On the other hand, if one player plays the dove, then the other player prefers to play the hawk, achieving payoffs of v, rather than v/2.
  • Finally, in the Stag Hunt, both (D, D) and (H, H) are Nash equilibria: hunting for deer is Pareto-optimal, but hunting for hare is risk-dominant.

Matching Pennies is another well-known example of a two player, zero-sum game. In this game, each of the players, the matcher and the mismatcher,^3 flips a coin, and the payoffs are determined as follows. If the coins come up matching (i.e., both heads or both tails), then the matcher wins, so the mismatcher pays the matcher $1. If the coins do not match (i.e., one head and one tail), then the mismatcher wins, so the matcher pays the mismatcher $1. In Figure 5, player 1 is the mismatcher and player 2 is the matcher. This game is called zero-sum because the payoffs in each cell of the matrix sum to zero.

In the game of Matching Pennies, there is no pure strategy Nash equilibrium. If player 1 plays H, then the best response of player 2 is T ; but if player 2 plays T , the best response of player 1 is not H, but T. Moreover, if player 1 plays T , then the best response of player 2 is H; but if player 2 plays H, then the best response of player 1 is not T , but H. This game, however, does have a mixed strategy Nash equilibrium. A mixed strategy is a randomization over a set of

(^3) The mismatcher is often affectionately referred to as Miss Matcher.

1

2 H T

H

T

Figure 5: Hawks and Doves

pure strategies. In particular, the probabilistic strategy profile in which both players choose H with probability 12 and T with probability 12 is the unique (mixed strategy) Nash equilibrium in the game of Matching Pennies.

A matrix game is a 3-tuple Γ = (N, (Ai, Ri) 1 ≤i≤n), where

  • N is a set of n players
  • Ai is a finite strategy (or action) set (ai ∈ Ai)
  • Ri : A → R is a payoff function, where A = A 1 ×... × An

Matrix games are also sometimes called games in strategic, or normal, form.

In this formalism, the Prisoners’ Dilemma consists of a set of players N = { 1 , 2 }, with strategy (action) sets A 1 = A 2 = {C, D}, and payoffs as follows:

R 1 (C, C) = R 2 (C, C) = 4 R 1 (C, D) = R 2 (D, C) = 0 R 1 (D, D) = R 2 (D, D) = 1 R 1 (D, C) = R 2 (C, D) = 5

A mixed strategy set for player i is the set of probability distributions over the action set Ai, which can be described by the simplex operator ∆:

∆(Ai) =

qi : Ai → [0, 1] |

ai ∈Ai

qi(ai) = 1

For convenience, let Qi ≡ ∆(Ai). The usual notational conventions extend to mixed strategies: e.g., Q =

i Qi^ and^ q^ = (qi, q−i)^ ∈^ Q.^ In the context of mixed strategies, the expected payoffs to player i from strategy profile q are:

Ea∼q [Ri(a)] =

a∈A

q(a)Ri(a)

where

q(a) =

∏^ N

j=

qj (aj )

Brouwer’s Fixed Point Theorem. Let X ⊂ Rn^ be nonempty, compact, and convex. If f : X → X is a continuous function, then f has a fixed point: i.e., there exists x∗^ ∈ X s.t. x∗^ = f (x∗).

2.2 Proof of Existence

The proof of existence of Nash equilibrium is a direct application of Kakutani’s fixed point theorem. It suffices to show that the set of mixed strategies Q is nonempty, compact, and convex, and that the best-response correspondence (i.e., br : Q ⇒ Q) is nonempty and convex-valued, with a closed graph.

Lemma The set of mixed strategies Q is nonempty, compact, and convex.

Proof Recall that Q =

i Qi, where^ Qi^ is the set of probability over distri- butions player i’s the action set Ai. The set Q is nonempty (assuming Ai is nonempty, for all players i).

Given a sequence {(q 1 m ,... , qmn )} of mixed strategies. that converges to (q∗ 1 ,... , q n∗). This limit point is indeed a mixed strategy: i.e., q∗ i ≥ 0 and

i q

∗ i = 1.^ The former claim follows from the fact that the limit of a sequence of non-negative points is itself non-negative. The latter claim follows from the fact that the sum of the limits equals the limit of the sum. Thus, Q is closed. Moreover, Q is bounded in each component by 0 and 1. Therefore, Q is compact.

The set of mixed strategies Qi for each player i is convex: i.e., for all qi, pi ∈ Qi, for all λ ∈ [0, 1], the convex combination λqi + (1 − λ)pi ∈ Qi. thus, given two elements (q 1 ,... , qn), (p 1 ,... , pn) ∈ Q, the convex combination λ(q 1 ,... , qn) + (1 − λ)(p 1 ,... , pn) = (λq 1 + (1 − λ)p 1 ,... , λqn + (1 − λ)pn) ∈ Q, for all λ ∈ [0, 1].

Thus, the set of mixed strategies Q is nonempty, compact, and convex.

Lemma The best-response correspondence is nonempty.

Proof By Weierstrass’ theorem, any real-valued continuous function on a compact set attains a maximum. Recall that the set Qi is compact. Since Ri is a linear function of Qi, Ri is continuous. Thus, bri : Q → Qi is nonempty, for all players i, from which it follows that br is nonempty.

Lemma The best-response correspondence is convex-valued.

Proof If q∗ i , p∗ i ∈ bri(q−i) are best replies of player i to q−i, then Ri(q∗ i , q−i) = Ri(p∗ i , q−i) = λRi(q∗ i , q∗−i) + (1 − λ)Ri(p∗ i , q∗−i). Now, by the linearity of Ri, λRi(q∗ i , q∗−i) + (1 − λ)Ri(p∗ i , q∗−i) = Ri(λq∗ i + (1 − λ)p∗ i , q−∗i). Thus, the convex combination λq i∗ + (1 − λ)p∗ i ∈ bri(q−i). Since q−i was arbitrary, bri is convex- valued. Since i was arbitrary, br is convex-valued.

Lemma The graph of the best-response correspondence is closed.

Proof Must show p ∈ br(q), given the sequences qm, pm^ ∈ Q s.t. qm^ → q and pm^ → p, with pm^ ∈ br(qm) for all m. Suppose not: i.e., suppose there exists player i s.t. pi 6 ∈ bri(q−i). It follows that there exists qi ∈ Qi s.t. Ri(qi, q−i) > Ri(pi, q−i). Now let δ ≡ Ri(qi, q−i) − Ri(pi, q−i) > 0. Since Ri is linear, and therefore continuous, for all ǫ > 0, there exists Mǫ ∈ N s.t. for all m ≥ Mǫ, |Ri(pmi , q −mi) − Ri(pi, q−i)| < ǫ and |Ri(qi, qm −i) − Ri(qi, q−i)| < ǫ. Now

Ri(qi, qm −i) > Ri(qi, q−i) − ǫ = Ri(pi, q−i) + Ri(qi, q−i) − Ri(pi, q−i) − ǫ = Ri(pi, q−i) + δ − ǫ

Ri(pmi , qm −i) + δ − 2 ǫ

If ǫ = δ/2, then Ri(qi, qm −i) > Ri(pmi , qm −i), for all m ≥ Mδ/ 2. But then pmi 6 ∈ bri(q −mi) for all m. Contradiction. Therefore, the graph of br is closed.

Exercise Compute the best-response correspondences for the game depicted in Figure 6, a version of Hawks and Doves. Plot these correspondences, and compute all Nash equilibria.

1

(^2) H

H

D 1,

D

2,

0,

−1,−

Figure 6: Hawks and Doves

3 Summary

In this lecture, we defined Nash equilibrium and reproved Nash’s theorem guar- anteeing its existence in all (finite) matrix games.

Although Nash equilibrium is the generally accepted solution concept in the deductive analysis of matrix games, the Nash equilibria in our examples are somewhat peculiar. In the Prisoners’ Dilemma, the Nash equilibrium payoffs are sub-optimal. In the game of Matching Pennies, there is no pure strategy Nash equilibrium; the unique Nash equilibrium is probabilistic. Finally, in the coordination and miscoordination games, the Nash equilibrium is not unique.

In future lectures, alternative notions of equilibria are discussed.