Prepare for your exams
Get points
Guidelines and tips
Sell on Docsity
Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Earn points to download

Earn points by helping other students or get them with a premium plan

Guidelines and tips

Sell on Docsity

Docsity AI

Prepare for your exams

Study with the several resources on Docsity

Find documents

Prepare for your exams with the study notes shared by other students like you on Docsity

Search for your university

Find the specific documents for your university's exams

Docsity AINEW

Summarize your documents, ask them questions, convert them into quizzes and concept maps

Explore questions

Clear up your doubts by reading the answers to questions asked by your fellow students

Earn points to download

Earn points by helping other students or get them with a premium plan

Share documents

20 Points

For each uploaded document

Answer questions

5 Points

For each given answer (max 1 per day)

All the ways to get free points

Get points immediately

Choose a premium plan with all the points you need

Study Opportunities

Choose your next study program

Get in touch with the best universities in the world. Search through thousands of universities and official partners

Community

Ask the community

Ask the community for help and clear up your study doubts

Free resources

Our save-the-student-ebooks!

Download our free guides on studying techniques, anxiety management strategies, and thesis advice from Docsity tutors

Game Theory: Minimax Theorem and Nash Equilibrium, Study notes of Ethics

Rice University Ethics

The minimax theorem in game theory, which guarantees the existence of optimal strategies in two-person zero-sum games. It also introduces the concept of Nash equilibrium in strategic form games, where players choose strategies based on their opponents' moves. proofs of lemmas and theorems related to these concepts.

Typology: Study notes

2021/2022

Uploaded on 03/31/2022

anarghya 🇺🇸

4.2

(21)

254 documents

1 / 47

This page cannot be seen from the preview

Don't miss anything!

Advanced Game Theory

Herve Moulin Baker Hall 263; moulin@rice.edu

Rice University ECO 440 Spring 2007

1 Two person zero sum games

1.1 Introduction: strategic interdependency

In this section we study games with only two players. We also restrict attention

to the case where the interests of the players are completely antagonistic: at the

end of the game, one player gains some amount, while the other loses the same

amount. These games are called “two person zero sum games”.

While in most economics situations the interests of the players are neither

in strong conflict nor in complete identity, this specific class of games provides

important insights into the notion of "optimal play". In some 2-person zero-sum

games,each player has a well defined “optimal” strategy, which does not depend

on her adversary decision (strategy choice). In some other games, no such

optimal strategy exists. Finally, the founding result of Game Theory, known as

the minimax theorem, says that optimal strategies exist when our players can

randomize over a finite set of deterministic strategies.

1.2 Two-person zero-sum games in strategic form

A two-person zero-sum game in strategic form is a triple G=(S, T, u),where

Sis a set of strategies available to the player 1,Tis a set of strategies available

to the player 2,and u:S×T→Ris the payoﬀfunction of the game G; i.e.,

u(s, t)is the resulting gain for player 1 and the resulting loss for player 2, if they

choose to play sand trespectively. Thus, player 1 tries to maximize u, while

player 2 tries to minimize it. We call any strategy choice (s,t)an outcome of

the game G.

When the strategy sets Sand Tare finite, the game Gcan be represented

by an nby mmatrix A, where n=|S|,m=|T|,and aij =u(si,t

j).

The secure utility level for player 1 (the minimal gain he can guarantee him-

self,nomatterwhatplayer2does)isgivenby

m=max

s∈Smin

t∈Tu(s, t)=max

imin

jaij.

Astrategys∗for player 1 is called prudent, if it realizes this secure max-min

gain, i.e., if min

t∈Tu(s∗,t)=m.

Discover Study notes of Ethics Rice University

Partial preview of the text

Download Game Theory: Minimax Theorem and Nash Equilibrium and more Study notes Ethics in PDF only on Docsity!

Advanced Game Theory

Herve Moulin Baker Hall 263; [email protected]

Rice University ECO 440 Spring 2007

1 Two person zero sum games

1.1 Introduction: strategic interdependency

In this section we study games with only two players. We also restrict attention to the case where the interests of the players are completely antagonistic: at the end of the game, one player gains some amount, while the other loses the same amount. These games are called “two person zero sum games”. While in most economics situations the interests of the players are neither in strong conflict nor in complete identity, this specific class of games provides important insights into the notion of "optimal play". In some 2-person zero-sum games,each player has a well defined “optimal” strategy, which does not depend on her adversary decision (strategy choice). In some other games, no such optimal strategy exists. Finally, the founding result of Game Theory, known as the minimax theorem, says that optimal strategies exist when our players can randomize over a finite set of deterministic strategies.

1.2 Two-person zero-sum games in strategic form

A two-person zero-sum game in strategic form is a triple G = (S, T, u), where S is a set of strategies available to the player 1 , T is a set of strategies available to the player 2 , and u : S × T → R is the payoff function of the game G; i.e., u(s, t) is the resulting gain for player 1 and the resulting loss for player 2, if they choose to play s and t respectively. Thus, player 1 tries to maximize u, while player 2 tries to minimize it. We call any strategy choice (s, t) an outcome of the game G. When the strategy sets S and T are finite, the game G can be represented by an n by m matrix A, where n = |S|, m = |T |, and aij = u(si, tj ). The secure utility level for player 1 (the minimal gain he can guarantee him- self, no matter what player 2 does) is given by

m = max s∈S min t∈T u(s, t) = max i min j aij.

A strategy s∗^ for player 1 is called prudent, if it realizes this secure max-min gain, i.e., if min t∈T u(s∗, t) = m.

The secure utility level for player 2 (the maximal loss she can guarantee herself, no matter what player 1 does) is given by

m = min t∈T max s∈S u(s, t) = min j max i aij.

A strategy t∗^ for player 2 is called prudent, if it realizes this secure min-max loss, i.e., if max s∈S u(s, t∗) = m. The secure utility level is what a player can get for sure, even if the other player behaves in the worst possible way. For each strategy of a player we calculate what could be his or her worst payoff, resulting from using this strategy (depending on the strategy choice of another player). A prudent strategy is one for which this worst possible result is the best. Thus, by a prudent choice of strategies, player 1 can guarantee that he will gain at least m, while player 2 can guarantee that she will loose at most m. Given this, we should expect that m ≤ m. Indeed:

Lemma 1 For all two-person zero-sum games, m ≤ m.

Proof : m = max s∈S min t∈T u(s, t) = min t∈T u(s∗, t) ≤ u(s∗, t∗) ≤ max s∈S u(s, t∗) =

min t∈T

max s∈S

u(s, t) = m.

Definition 2 If m = m, then m = m = m is called the value of the game G. If m < m, we say that G has no value. An outcome (s∗, t∗) ∈ S × T is called a saddle point of the payoff function u, if u(s, t∗) ≤ u(s∗, t∗) ≤ u(s∗, t) for all s ∈ S and for all t ∈ T.

Remark 3 Equivalently, we can write that (s∗, t∗) ∈ S × T is a saddle point if max s∈S u(s, t∗) ≤ u(s∗, t∗) ≤ min t∈T u(s∗, t)

When the game is represented by a matrix A, (s∗, t∗)^ will be a saddle point, if and only if as∗t∗ is the largest entry in its column and the smallest entry in its row. A game has a value if and only if it has a saddle point:

Theorem 4 If the game G has a value m, then an outcome (s∗, t∗) is a saddle point if and only if s∗^ and t∗^ are prudent. In this case, u(s∗, t∗) = m. If G has no value, then it has no saddle point either.

Proof : Suppose that m = m = m, and s∗^ and t∗^ are prudent strategies of players 1 and 2 respectively. Then by the definition of prudent strategies

max s∈S

u(s, t∗) = m = m = m = min t∈T

u(s∗, t).

In particular, u(s∗, t∗) ≤ m ≤ u(s∗, t∗); hence, u(s∗, t∗) = m. Thus, max s∈S u(s, t∗) = u(s∗, t∗) = min t∈T u(s∗, t), and so (s∗, t∗) is a saddle point.

Lemma 6 (rectangularity property) A two-person zero-sum games in normal form has at most one value, but it can have several saddle points, and each player can have several prudent (and even several optimal) strategies. Moreover, if (s 1 , t 1 ) and (s 2 , t 2 ) are saddle points of the game, then (s 1 , t 2 ) and (s 1 , t 2 ) are also saddle points.

A two-person zero-sum games in normal form is called symmetric if S = T, and u(s, t) = −u(t, s) for all s, t. When S, T are finite, symmetric games are those which can be represented by a square matrix A, for which aij = −aji for all i, j (in particular, aii = 0 for all i).

Lemma 7 If a symmetric game has a value then this value is zero. Moreover, if s is an optimal strategy for one player, then it is also optimal for another one.

Proof: Say the game (S, T, u) has a value v, then we have

v = max s min t u(s, t) = max s {− max t u(t, s)} = − min s max t u(t, s) = −v

so v = 0. The proof of the 2d statement is equally easy.

1.3 Two-person zero-sum games in extensive form

A game in extensive form models a situation where the outcome depends on the consecutive actions of several involved agents (“players”). There is a precise sequence of individual moves, at each of which one of the players chooses an action from a set of potential possibilities. Among those, there could be chance, or random moves, where the choice is made by some mechanical random device rather than a player (sometimes referred to as “nature” moves). When a player is to make the move, she is often unaware of the actual choices of other players (including nature), even if they were made earlier. Thus, a player has to choose an action, keeping in mind that she is at one of the several possible actual positions in the game, and she cannot distinguish which one is realized: an example is bridge, or any other card game. At the end of the game, all players get some payoffs (which we will measure in monetary terms). The payoff to each player depends on the whole vector of individual choices, made by all game participants. The most convenient representation of such a situation is by a game tree, where to non terminal nodes are attached the name of the player who has the move, and to terminal nodes are attached payoffs for each player. We must also specify what information is available of a player at each node of the tree where she has to move. A strategy is a full plan to play a game (for a particular player), prepared in advance. It is a complete specification of what move to choose in any potential situation which could arise in the game. One could think about a strategy as a set of instructions that a player who cannot physically participate in the game (but who still wants to be the one who makes all the decisions) gives

to her "agent". When the game is actually played, each time the agent is to choose a move, he looks at the instruction and chooses according to it. The representative, thus, does not make any decision himself! Note that the reduction operator just described does not work equally well for games with n -players with multiple stages of decisions. Each player only cares about her final payoff in the game. When the set of all available strategies for each player is well defined, the only relevant in- formation is the profile of final payoffs for each profile of strategies chosen by the players. Thus to each game in extensive form is attached a reduced game in strategic form. In two-person zero sum games, this reduction is not conceptually problematic, however for more general n-person games, it does not capture the dynamic character of a game in extensive form, and for this we need to develop new equilibrium concepts: see Chapter 5. In this section we discuss games in extensive form with perfect information. Examples:

Gale’s chomp game: the player take turns to destroy a n × m rectangular grid, with the convention that if player i kills entry (p, q), all entries (p^0 , q^0 ) such that (p^0 , q^0 ) ≥ (p, q) are destroyed as well. When a player moves, he must destroy one of the remaining entries.The player who kills entry (1, 1) loses. In this game player 1 who moves first has an optimal strategy that guarantees he wins. This strategy is easy to compute if n = m, not so if n 6 = m.
Chess and Zermelo’s theorem: the game of Chess has three payoffs, +1, − 1 , 0. Although we do which one, one of these 3 numbers is the value of the game, i.e., either Win can guarantee a win, or Black can, or both can secure a draw.

Definition 8 A finite game in extensive form with perfect information is given by

a tree, with a particular node taken as the origin;
for each non-terminal node, a specification of who has the move;
for each terminal node, a payoff attached to it.

Formally, a tree is a pair Γ = (N, σ) where N is the finite set of nodes, and σ : N → N ∪ ∅ associates to each node its predecessor. A (unique) node n 0 with no predecessors (i.e., σ(n 0 ) = ∅) is the origin of the tree. Terminal nodes are those which are not predecessors of any node. Denote by T (N) the set of terminal nodes. For any non-terminal node r, the set {n ∈ N : σ(n) = r} is the set of successors of r. The maximal possible number of edges in a path from the origin to some terminal node is called the length of the tree Γ. Given a tree Γ, a two-person zero-sum game with perfect information is defined by a partition of N as N = T (N) ∪ N 1 ∪ N 2 into three disjoint sets and a payoff function defined over the set of terminal nodes u : T (N ) → R. For each non-terminal node n, n ∈ Ni (i = 1, 2 ) means that player i has the move at this node. A move consists of picking a successor to this node.

the node n becomes the terminal node of so reduced game tree. After a finite number of such steps, the original game will reduce to one node n 0 , and the payoff assigned to it will be the value of the initial game. The optimal strategies of the players are given by their optimal moves at each node, which we wrote down when reducing the game.

Remark 10 Consider the simple case, where all payoffs are either +1 or − 1 (a player either “wins” or “looses”), and where whenever a player has a move at some node, his/her opponent is the one who has a move at all its successors. An example is Gale’s chomp game above. When we solve this game backward, all payoffs which we attach to non-terminal nodes in this process are +1 or − 1 (we can simply write “+” or “−”). Now look at the original game tree with “+” or “−” attached to each its node according to this procedure. A “+” sign at a node n means that this node (or “this position”) is “winning” <for player 1>, in a sense that if the player 1 would have a move at this node he would surely win, if he would play optimally. A “−” sign at a node n means that this node (or “this position”) is “loosing” <for player 1>, in a sense that if the player 1 would have a move at this node he would surely lose, if his opponent would play optimally. It is easy to see that “winning” nodes are those which have at least one “loosing” successor, while “loosing” nodes are those whose all successors are “winning”. A number of the problems below are about computing the set of winning and losing positions.

1.4 Mixed strategies

Motivating examples:

Matching pennies: the matrix

μ 1 − 1 − 1 1

, has no saddle point. Moreover,

for this game m = − 1 and m = 1 (the worst possible outcomes), i.e., a prudent strategy does not provide any of two players with any minimal guarantee. Here a player’s payoff depends completely on how well he or she can predict the choice of the other player. Thus, the best way to play is to be unpredictable, i.e. to choose a strategy (one of the two available) completely random. It is easy to see that if each player chooses either strategy with probability 1 / 2 according to the realization of some random device (and so without any predictable pattern), then “on average” (after playing this game many times) they both will get zero. In other words, under such strategy choice the “expected payoff” for each player will be zero. Moreover, we show below that this randomized strategy is also optimal in the mixed extension of the deterministic game. Bluffing in Poker When optimal play involves some bluffing, the bluffing behavior needs to be unpredictable. This can be guaranteed by delegating a choice of when to bluff to some (carefully chosen!) random device. Then even the player herself would not be able to predict in advance when she will be bluffing. So the opponents will certainly not be able to guess whether she is bluffing. See the bluffing game (problem 17) below.

Schelling’s toy safe. Ann has 2 safes, one at her office which is hard to crack, another "toy" fake at home which any thief can open with a coat-hanger (as in the movies). She must keep her necklace, worth $10,000, eithe at home or at the office. Bob must decide which safe to visit (he has only one visit at only one safe). If he chooses to visit the office, he has a 20% chance of opening the safe. If he goes to ann’s home, he is sure to be able to opent the safe. The point of this example is that the presence of the toy safe helps Ann, who should actually use it to hide the necklace with a positive probability.

Even when using mixed strategies is clearly warranted, it remains to deter- mine which mixed strategy to choose (how often to bluff, and on what hands?). The player should choose the probabilities of each deterministic choice (i.e. on how she would like to program the random device she uses). Since the player herself cannot predict the actual move she will make during the game, the pay- off she will get is uncertain. For example, a player may decide that she will use one strategy with probability 1 / 3 , another one with probability 1 / 6 , and yet another one with probability 1 / 2. When the time to make her move in the game comes, this player would need some random device to determine her final strategy choice, according to the pre-selected probabilities. In our exam- ple, such device should have three outcomes, corresponding to three potential choices, relative chances of these outcomes being 2 : 1 : 3. If this game is played many times, the player should expect that she will play 1-st strategy roughly 1 / 3 of the time, 2-nd one roughly 1 / 6 of the time, and 3-d one roughly 1 / 2 of the time. She will then get “on average” 1 / 3 (of payoff if using 1-st strategy) +1/ 6 (of payoff if using 2-nd strategy) +1/ 2 (of payoff if using 3-d strategy). Note that, though this player’s opponent cannot predict what her actual move would be, he can still evaluate relative chances of each choice, and this will affect his decision. Thus a rational opponent will, in general, react differently to different mixed strategies. What is the rational behavior of our players when payoffs become uncertain? The simplest and most common hypothesis is that they try to maximize their expected (or average) payoff in the game, i.e., they evaluate random payoffs simply by their expected value. Thus the cardinal values of the deterministic payoffs now matter very much, unlike in the previous sections where the ordinal ranking of the outcomes is all that matters to the equilibrium analysis. We give in Chapter 2 some axiomatic justifications for this crucial assumption. The expected payoff is defined as the weighted sum of all possible payoffs in the game, each payoff being multiplied by the probability that this payoff is realized. In matching pennies, when each player chooses a “mixed strategy” (0. 5 , 0 .5) (meaning that 1-st strategy is chosen with probability 0.5, and 2- nd strategy is chosen with probability 0.5), the chances that the game will end up in each particular square (i, j), i.e., the chances that the 1-st player will play his i-th strategy and the 2-nd player will play her j-th strategy, are

5 × 0 .5 = 0. 25. So the expected payoff for this game under such strategies is 1 × 0 .25 + (−1) × 0 .25 + 1 × 0 .25 + (−1) × 0 .25 = 0. Consider a general finite game G = (S, T, u), represented by an n by m

Now, sT^ Aej^ is the expected payoff to player 1, when he uses (mixed) strategy

s and player 2 uses (pure) strategy ej^. Hence, sT^ Ay =

Pm j=

sT^ Aej^

is a

weighted average of player 1’s payoffs against pure strategies of player 2 (when player 1 uses strategy s). In this weighted sum, weights yj are equal to the probabilities that player 2 would choose these pure strategies ej^. Given this claim, v 1 (s) = min y∈Y

sT^ Ay, the minimum of sT^ Ay, will be attained

at some pure strategy j (i.e., at some ej^ ∈ Y ). Indeed, if sT^ Aej^ > v 1 (s) for all j, then we would have sT^ Ay =

P

sT^ Aej^

v 1 (s) for all y ∈ Y .. Hence, v 1 (s) = min j sT^ A·j , and v 1 = max s∈s min j sT^ A·j. Similarly, v 2 (y) =

max i Ai·y, where Ai· is the i-th row of the matrix A, and v 2 = min y∈Y max i Ai·y. As with pure strategies, the secure utility level player 1 can guarantee himself (minimal amount he could gain) cannot exceed the secure utility level payer 2 can guarantee herself (maximal amount she could lose): v 1 ≤ v 2. This follows from Lemma 1. Such prudent mixed strategies s and y are called maximin strategy (for player 1) and minimax strategy (for player 2) respectively.

Theorem 13 (The Minimax Theorem) v 1 = v 2 = v. Thus, if players can use mixed strategies, any game with finite strategy sets has a value.

Proof. Let n × m matrix A be the matrix of a two person zero sum game. The set of all mixed strategies for player 1 is s = {(s 1 , ..., sn) :

Pn i=1 si^ = 1, si^ ≥ 0 }, while for player 2 it is Y = {(y 1 , ..., ym) :

Pm i=1 yj^ = 1, yj^ ≥^0 }. Let v 1 (s) = min y∈Y s · Ay be the smallest payoff player 1 can get if he chooses to

play s. Then v 1 = max s∈s v 1 (s) = max s∈s min y∈Y

s·Ay is the secure utility level for player

Similarly, we define v 2 (y) = max s∈s s · Ay, and v 2 = min y∈Y

v 2 (y) = min y∈Y

max s∈s s · Ay

is the secure utility level for player 2. We know that v 1 ≤ v 2. Consider the following closed convex sets in Rn:

S = {z ∈ Rn^ : z = Ay for some y ∈ Y } is a convex set, since Ay = y 1 A· 1 + ... + ymA·m, where A·j is j-th column of the matrix A, and hence S is the set of all convex combinations of columns of A, i.e., the convex hull of the columns of A. Moreover, since it is a convex hull of m points, S is a convex polytope in Rn^ with m vertices (extreme points), and thus it is also closed and bounded.
Cones Kv = {z ∈ Rn^ : zi ≤ v for all i = 1, ..., n} are obviously convex and closed for any v ∈ R. Further, it is easy to see that Kv = {z ∈ Rn^ : s·z ≤ v for all s ∈ s}.

Geometrically, when v is very small, the cone Kv lies far from the bounded set S, and they do not intersect. Thus, they can be separated by a hyperplane. When v increases, the cone Kv enlarges in the direction (1, ..., 1), being “below”

the set S, until the moment when Kv will “touch” the set S for the first time. Hence, v, the maximal value of v for which Kv still can be separated from S, is reached when the cone Kv first “touches” the set S. Moreover, Kv and S have at least one common point z, at which they “touch”. Let y ∈ Y be such that Ay = z ∈ S ∩ Kv. Assume that Kv and S are separated by a hyperplane H = {z ∈ Rn^ : s · z = c}, where

Pn i=1 si^ = 1. It means that^ s^ ·^ z^ ≤^ c^ for all^ z^ ∈^ Kv^ ,^ s^ ·^ z^ ≥^ c^ for all z ∈ S, and hence s · z = c. Geometrically, since Kv lies “below” the hyperplane H, all coordinates si of the vector s must be nonnegative, and thus s ∈ s. Moreover, since Kv = {z ∈ Rn^ : s · z ≤ v for all s ∈ s}, s ∈ s and z ∈ Kv , we obtain that c = s · z ≤ v. But since vector (v, ..., v) ∈ Kv we also obtain that c ≥ s · (v, ..., v) = v

Pn i=1 si^ =^ v. It follows that^ c^ =^ v. Now, v 1 = max s∈s min y∈Y

s · Ay ≥ min y∈Y

s · Ay ≥ v (since s · z ≥ c = v for all z ∈ S,

i.e. for all z = Ay, where y ∈ Y ). Next, v 2 = min y∈Y

max s∈s s · Ay ≤ max s∈s s · Ay = max s∈s s · z = max s∈s z · s ≤ v (since

z = Ay ∈ Kv , and since z · s ≤ v for all s ∈ s and all z ∈ Kv , in particular, z · s ≤ v for all s ∈ s). We obtain that v 2 ≤ v ≤ v 1. Together with the fact that v 1 ≤ v 2 , it gives us v 2 = v = v 1 , the desired statement. Note also, that the maximal value of v 1 (s) is reached at s, while the minimal value of v 2 (y) is reached at y. Thus, s and y constructed in the proof are optimal strategies for players 1 and 2 respectively.

Remark 14 When the sets of pure strategies are infinite, mixed strategies can still be defined as probability distributions over these sets, but the existence of a value for the game in mixed strategies is no longer guaranteed. One needs to check for instance that the assumptions of Von Neumann’s Theorem below are satisfied.

1.5 Computation of optimal strategies

How can we find the maximin (mixed)strategy s, the minimax (mixed) strategy y, and the value v of a given game? If the game with deterministic strategies (the original game) has a saddle point, then v = m, and the maximin and minimax strategies are deterministic. Finding them amounts to find an entry aij of the matrix A which is both the maximum entry in its column and the minimum entry in its row. When the original game has no value, the key to computing optimal mixed strategies is to know their supports, namely the set of strategies used with strictly positive probability. Let s, y be a pair of optimal strategies, and v = sT^ Ay. Since for all j we have that sT^ Aej^ ≥ min y∈Y

sT^ Ay = v 1 (s) = v 1 = v, it

follows that v = sT^ Ay = y 1

sT^ Ae^1

... + ym

sT^ Aem

≥ y 1 v + ... + ymv = v (y 1 + ... + ym) = v, and the equality implies sT^ A·j = sT^ Aej^ = v for all j such that yj 6 = 0. Thus, player 2 receives her minimax value v 2 = v by playing

follows that optimal strategies (s 1 , s 2 ) and (y 1 , y 2 ) must have all components positive. Let us repeat the argument above for the 2 × 2 case. We have v = sT^ Ay = a 11 s 1 y 1 + a 12 s 1 y 2 + a 21 s 2 y 1 + a 22 s 2 y 2 , or

s 1 (a 11 y 1 + a 12 y 2 ) + s 2 (a 21 y 1 + a 22 y 2 ) = v.

But a 11 y 1 + a 12 y 2 ≤ v and a 21 y 1 + a 22 y 2 ≤ v (these are the losses of player 2 against 1-st and 2-nd pure strategies of player 1; but since y is player’s 2 optimal strategy, she cannot lose more then v in any case). Hence, s 1 (a 11 y 1 + a 12 y 2 ) + s 2 (a 21 y 1 + a 22 y 2 ) ≤ s 1 v + s 2 v = v. Since s 1 > 0 and s 2 > 0 , the equality is only possible when a 11 y 1 + a 12 y 2 = v and a 21 y 1 + a 22 y 2 = v. Similarly, it can be seen that a 11 s 1 + a 21 s 2 = v and a 12 s 1 + a 22 s 2 = v. We also know that s 1 + s 2 = 1 and y 1 + y 2 = 1. We thus have the linear system with 6 equations and 5 variables s 1 , s 2 , y 1 , y 2 and v. Minimax theorem guarantees us that this system has a solution with s 1 , s 2 , y 1 , y 2 ≥ 0. One of these 6 equations is actually redundant. The system has a unique solution provided the original game has no saddle point. In particular

v = a 11 a 22 − a 12 a 21 a 11 + a 22 − a 12 − a 21

1.5.2 Symmetric games

The game with matrix A is symmetric if A = −AT^ (Exercise:check this). Like in a general 2 person zero-sum game, the value of a symmetric game is zero. Moreover, if s is an optimal strategy for player 1, then it is also optimal for player 2.

1.6 Von Neumann’s Theorem

It generalizes the minimax theorem. It follows from the more general Nash Theorem in Chapter 4.

Theorem 17 The game (S, T, u) has a value and optimal strategies if S, T are convex compact subsets of some euclidian spaces, the payoff function u is con- tinuous on S × T , and for all s ∈ S, all t ∈ T

t^0 → u(s, t^0 ) is quasi-convex in t^0 ; s^0 → u(s^0 , t) is quasi-concave in s^0

Example: Borel’s model of poker. Each player bids $1, then receives a hand mi ∈ [0, 1]. Hands are inde- pendently and uniformly distributed on [0, 1].Each player observes only his hand.Player 1 moves first, by either folding or bidding an additional $5. If 1 folds, the game is over and player 2 collects the pot. If 1 bids, player 2 can either fold (in which case 1 collects the pot) or bid $5 more to see: then the hands are revealed and the highest one wins the pot.

A strategy of player i can be any mapping from [0, 1] into {F, B}, however it is enough to consider the following simple threshold strategies si : fold whenever mi ≤ si, bid whenever mi > si. Notice that for player 2, actual bidding only occur if player 1 bids before him. Compute the probability π(s 1 , s 2 ) that m 1 > m 2 given that si ≤ mi ≤ 1 :

π(s 1 , s 2 ) =

1 + s 1 − 2 s 2 2(1 − s 2 )

if s 2 ≤ s 1

1 − s 2 2(1 − s 1 )

if s 1 ≤ s 2

from which the payoff function is easily derived:

u(s 1 , s 2 ) = − 6 s^21 + 5s 1 s 2 + 5s 1 − 5 s 2 if s 2 ≤ s 1

= 6s^22 − 7 s 1 s 2 + 5s 1 − 5 s 2 if s 1 ≤ s 2

The Von Neumann theorem applies, and the utility function is continuously differentiable. Thus the saddle point can be found by solving the system ∂u ∂si (s) = 0, i^ = 1,^2. This leads to

s∗ 1 = (

)^2 = 0.51; s∗ 2 =

and the value − 0. 51 : player 2 earns on average 51 cents. Other examples are in the problems below.

1.7 Problems for two person zero-sum games

1.7.1 Pure strategies

Problem 1 Ten thousands students formed a square. In each row, the tallest student is chosen and Mary is the shortest one among those. In each column, a shortest student is chosen, and John is the tallest one among those. Who is taller–John or Mary?

Problem 2 Compute m = min max and m = max min values for the following matrices: 2 4 6 3 6 2 4 3 4 6 2 3

Find all saddle points.

Problem 3. Gale’s roulette a)Each wheel has an equal probability to stop on any of its numbers. Player 1 chooses a wheel and spins it. Player 2 chooses one of the 2 remaining wheels (while the wheel chosen by 1 is still spinning), and spins it. The winner is the player whose wheel stops on the higher score. He gets $1 from the loser. Numbers on wheel #1: 2,4,9; on wheel #2: 3,5,7; on wheel #3: 1,6,

In each question you must check that the game in deterministic strategies (given in the matrix form) has no value, then find the value and optimal mixed strate- gies. Results in section 1.5 will prove useful.

a) A =

μ 2 3 1 5 4 1 6 0

b) A =

c) A =

d) A =

e) A =

f) A =

Problem 10 Hiding a number Fix an increasing sequence of positive numbers a 1 ≤ a 2 ≤ a 3 ≤ · · · ≤ ap ≤ · · ·. Each player chooses an integer, the choices being independent. If they both choose the same number p, player 1 receives $p from player 2. Otherwise, no money changes hand. a) Assume first X∞

and show that each player has a unique optimal mixed strategy. b) In the case where X∞

show that the value is zero, that every strategy of player 1 is optimal, whereas player 2 has only "ε-optimal" strategies, i.e., strategies guaranteeing a payoff not larger than ε, for arbitrarily small ε.

Problem 11 Picking an entry

a) Player 1 chooses either a row or a column of the matrix

. Player 2

chooses an entry of this matrix. If the entry chosen by 2 is in the row or column chosen by 1, player 1 receives the amount of this entry from player 2. Otherwise no money changes hands. Find the value and optimal strategies.

b) Same strategies but this time if player 2 chooses entry s and this entry is not in the row or column chosen by 1, player 2 gets $s from player 1; if it is in the row or column chosen by 1, player 1 gets $s from player 2 as before.

Problem 12 Guessing a number Player 2 chooses one of the three numbers 1,2 or 5. Call s 2 that choice. One of the two numbers not selected by Player 2 is selected at random (equal probability 1/2 for each) and shown to Player 1. Player 1 now guesses Player 2’s choice: if his guess is correct, he receives $s 2 form Player 2, otherwise no money changes hand. Solve this game: value and optimal strategies. Hint: drawing the full normal form of this game is cumbersome; describe instead the strategy of player 1 by three numbers q 1 , q 2 , q 5. The number q 1 tells what player 1 does if he is shown number 1: he guesses 2 with probability q 1 and 5 with proba. 1 − q 1 ; and so on.

Problem 13 Asume that both players choose optimal (mixed) strategies x and y and thus the resulting payoff in the game is v. We know that player 1 would get v if against payer 2’s choice y he would play any pure strategy with positive probability in x (i.e. any pure strategy i, such that si > 0 ), and he would get less then v if he would play any pure strategy i, such that xi = 0. Explain why a rational player 1, who assumes that his opponent is also rational, should not choose a pure strategy i such that xi > 0 instead of x.

Problem 14 Bluffing game At the beginning, players 1 and 2 each put $1 in the pot. Next, player 1 draws a card from a shuffled deck with equal number of black and red cards in it. Player 1 looks at his card (he does not show it to player 2) and decides whether to raise or fold. If he folds, the card is revealed to player 2, and the pot goes to player 1 if it is red, to player 2 if it is black. If player 1 raises, he must add $1 to the pot, then player 2 must meet or pass. If she passes the game ends and player 1 takes the pot. If she meets, she puts $α in the pot. Then the card is revealed and, again, the pot goes to player 1 if it is red, to player 2 if it is black.. Draw the matrix form of this game. Find its value and optimal strategies as a function of the parameter α. Is bluffing part of the equilibrium strategy of player 1?

2 Nash equilibrium

In a general n-person game in strategic form, interests of the players are neither identical nor completely opposed. Thus the information each player possesses about other participants in the game may influence her behavior. We discuss in this chapter the two most important scenarios within which the Nash equilib- rium concept is often a compelling model of rational behavior:

the decentralized scenarios where mutual information is minimal, to the

Definition 21 A game in strategic form G = (N, Si, ui, i ∈ N) is symmetri- cal if Si = Sj for all i, j, and the mapping s → u(s) from S|N|^ into R|N^ |^ is symmetrical.

In a symmetrical game if two players exchange strategies, their payoffs are exchanged and those of other players remain unaffected.

2.1 Decentralized behavior and dynamic stability

In this section we interpret a Nash equilibrium as the resting point of a dynami- cal system. The players behave in a simple myopic fashion, and learn about the game by exploring their strategic options over time. Their behavior is compati- ble with total ignorance about the existence and characteristics of other players, and what their behavior could be. Think of Adam Smith’s invisible hand paradigm: the price signal I receive from the market looks to me as an exogenous parameter on which my own behavior has no effect. I do not know how many other participants are involved in the market, and what they could be doing. I simply react to the price by maximizing my utility, without making assumptions about its origin. The analog of the competitive behavior in the context of strategic games is the best reply behavior. Take the profile of strategies s−i chosen by other players as an exogeneous parameter, then pick a strategy si maximizing your own utility ui, under the assumption that this choice will not affect the parameter s−i. The deep insight of the invisible hand paradigm is that decentralized price taking behavior will result in an efficient allocation of resources (a Pareto effi- cient outcome of the economy). This holds true under some specific microeco- nomic assumptions in the Arrow-Debreu model, and consists of two statements. First the invisible hand behavior will converge to a competitive equilibrium; sec- ond, this equilibrium is efficient. (The second statement is much more robust than the first). In the much more general strategic game model, the limit points of the best reply behavior are the Nash equilibrium outcomes. Both statements, the best reply behavior converges, the limit point is an efficient outcome, are problem- atic. The examples below show that not only the best reply behavior may not converge at all, or if it converges, the limit equilibrium outcome may well be inefficient (Pareto inferior). Decentralized behavior may diverge, or it may converge toward a socially suboptimal outcome.

Definition 22 Given the game in strategic form G = (N, Si, ui, i ∈ N ), the best-reply correspondence of player i is the (possibly multivalued) mapping bri

from S−i =

Y

j∈N Â {i}

Sj into Si defined as follows

si ∈^ bri(s−i)^ ⇔^ ui(si, s−i)^ ≥^ ui(s^0 i, s−i)^ for all^ s^0 i ∈^ Si

Definition 23 We say that the sequence st^ ∈ SN , t = 0, 1 , 2 , · · · , is a best reply dynamics if for all t ≥ 1 and all i, we have

sti ∈ {st i− 1 } ∪ bri(st −−i^1 ) for all t ≥ 1

and sti ∈ bri(st −−i^1 ) for infinitely many values of t

We say that st^ is a sequential best reply dynamics, also called an improvement path, if in addition at each step at most one player is changing her strategy.

The best reply dynamics is very general, in that it does not require the successive adjustments of the players to be synchronized. If all players use a best reply at all times, we speak of myopic adjustment; if our players take turn to adjust, we speak of sequential adjustment. For instance with two players the latter dynamics is:

if t is even: st 1 ∈ bri(st 2 − 1 ), st 2 = st 2 −^1

if t is odd: st 2 ∈ bri(st 1 − 1 ), st 1 = st 1 −^1

But the definition allows much more complicated dynamics, where the timing of best reply adjustments varies accross players. An important requirement is that at any date t, every player will be using his best reply adjustment some time in the future. The first observation is an elementary result.

Proposition 24 Assume the strategy sets Si of each player are compact and the payoff functions ui are continuous. If the best reply dynamics (st)t∈N converges to s∗^ ∈ SN , then s∗^ is a Nash equilibrium.

Proof. Pick any ε > 0. As ui is uniformly continuous on SN , there exists T such that

for all i, j ∈ N and t ≥ T : |ui(stj , s−j ) − ui(s∗ j , s−j )| ≤

ε n

for all s−j ∈ S−j

Fix an agent i. By definition of the b.r. dynamics, there is a date t ≥ T such that st i+1 ∈ bri(st −i). This implies for any si ∈ Si

ui(s∗) + ε ≥ ui(st i +1, st −i) ≥ ui(si, st −i) ≥ ui(si, s∗−i) −

n − 1 n

where the left and right inequality follow by repeated application of uniform continuity. Letting ε go to zero ends the proof. Note that the topological assumptions in the Proposition hold true if the strategy sets are finite.

Definition 25 We call a Nash equilibrium s strongly globally stable if any best reply dynamics (starting form any initial profile of strategies in SN ) converges to s. Such an equilibrium must be the unique equilibrium.

Game Theory: Minimax Theorem and Nash Equilibrium, Study notes of Ethics

Related documents

Partial preview of the text

Download Game Theory: Minimax Theorem and Nash Equilibrium and more Study notes Ethics in PDF only on Docsity!

Advanced Game Theory

Herve Moulin Baker Hall 263; [email protected]

Rice University ECO 440 Spring 2007

1 Two person zero sum games

1.1 Introduction: strategic interdependency

1.2 Two-person zero-sum games in strategic form

1.3 Two-person zero-sum games in extensive form

1.4 Mixed strategies

P

1.5 Computation of optimal strategies

1.6 Von Neumann’s Theorem

1.7 Problems for two person zero-sum games

2.1 Decentralized behavior and dynamic stability

Y