Two-player zero-sum games

It turns out that two-player zero-sum games are fairly easily solved, by which I mean, one can readily calculate an optimal solution to a given game. The following is based partly on the a very quick/easy introductory book by M. Davis1. Here a two-player game is a situation where two parties must simultaneously make a decision resulting in a payoff to either party. Each possible pair of decisions may result in a different payoff. In the zero-sum case, a positive payoff for the first player results in an equal negative loss for the second player and vice versa. In this case we can represent the payoff as a matrix, where the rows represent the possible decisions of the first player and the columns, those of the second player. Additionally, we choose for a positive entry in the payoff matrix to correspond to a positive payoff for the row-player.

As a simple example, here is a payoff matrix where each player has two possible choices $$ P = \begin{bmatrix} 1 & 2 \cr -1 & 0 \end{bmatrix} $$

In considering an optimal strategy for the row-player, we might decide what the best choice is for each possible decision of the column-player. If the column-player chooses strategy 1, the possible payouts for the row-player are 1 and \(-1\) for strategies 1 and 2, respectively. The row-player should therefore choose strategy 1. If the column-player chooses strategy 2, the possible payouts for the row-player are 2 and \(1\) for strategies 1 and 2, respectively. The row-player would then choose strategy 1 in this situation too. Therefore, we know the row-player should always play strategy 1.

Similarly, we may consider the strategies of the column-player. If we work through the possible choices of the row-player, we see the column-player should always choose strategy 1. So we should expect that if both players choose their optimal strategy, the row-player will gain 1 and the column-player will lose 1.

We call this a pure strategy. However, there are situations where randomly choosing between the possible decisions may result in a higher payoff than playing any single decision, known as a mixed strategy.

Consider the payoff matrix: $$ P = \begin{bmatrix} 0 & 2 \cr 3 & 0 \end{bmatrix} $$ Our previous method of analysis would suggest the row-player should choose strategy 2 if the column-player chooses strategy 1, and they should choose strategy 1 if the column-player chooses strategy 2. So how does either player decide which to choose? In this case, we develop a mixed strategy in which we receive the same payoff for possible choice of the opposite player. Assume as the row-player, we choose strategy 1 with probability \(p\). Then if the column-player chooses strategy 1, we receive payoff: \(0 p + 3 (1-p)\). If the column-player chooses strategy 2, we would receive payoff: \( 2 p + 0 (1-p)\). Setting these equal and solving for \(p\) results in: $$ 2 p = 3 (1-p) \rightarrow p = 3/5 $$

So the mixed strategy for the row-player which chooses strategy 1 \(60\)% of the time and strategy 2 \(40\)% of the time results in an average payoff of 1.2. We could analyze the column-player’s strategy in the same way, finding a mixed strategy of \(40\)% for strategy 1 and \(60\)% for strategy 2.

This is simple enough to do by hand for the case of only a couple possible pure strategies, but we can also do this algorithmically for more complicated games. Consider the payoff matrix $$ P = \begin{bmatrix} 8 & 6 & 14 \cr 6 & 15 & 3\cr 18 & 6 & 6 \end{bmatrix}~. $$

We start by representing a mixed strategy as a vector of probabilities \(x\) 2 $$ 0 \le x~ \textrm{ and } ~\textrm{e}^T x = 1 $$ where \(\textrm{e}\) is the all ones vector.

Given a strategy from the row-player, we can calculate the expected payoff for the column player. For example if the row-player chooses strategy \(2\) and the column player plays a mixed strategy \(x\), then they would receive the payoff $$ 6x_1 + 15x_2 + 3x_3 = \sum_{j=1}^{3} P_{2, j}x_j $$

Choosing an optimal mixed strategy is done by maximizing the column player’s minimum expected payoff over all of the row-player’s pure strategies. $$ \max_{0 \le x,~ \textrm{e}^T x = 1} \min_{i} \sum_{j=1}^{3} P_{i j}x_j ~.$$

Now here’s the cool part, this can be formulated as a linear program to find the maximum expected minimum payoff \(\gamma\): $$ \begin{aligned} \textrm{maximize }~~~~ & \gamma \cr \textrm{subject to }~~~~ & \gamma \textrm{e} \le Px, \cr & \textrm{e}^T x = 1, \cr & 0 \le x~. \end{aligned} $$

Now we have to tweak this a little bit to put this into the standard form shown below so that we can just plug this into a linear programming solver. The scipy.optimize.linprog solver wants a program of the form $$ \begin{aligned} \textrm{minimize }~~~~ & c^T x \cr \textrm{subject to }~~~~ & A_{ub} x \le b_{ub}, \cr & A_{eq} x = b_{eq}, \cr & l \le x \le u ~. \end{aligned} $$ We combine the expected payoff and the mixed strategy into one combined state and instead consider minimizing the negative payoff, so that linear program is now: $$ \begin{aligned} \textrm{minimize }~~~~ & [-1~~~~~ 0] {\gamma \brack x} \cr \textrm{subject to }~~~~ & [\textrm{e}~~ -P] {\gamma \brack x} \le 0, \cr & [0~~~~~~ \textrm{e}^T] {\gamma \brack x} = 1, \cr & 0 \le x~. \end{aligned} $$

We could also have considered this from the point of view of the row-player. In this case, the row player’s payout to the column player when the column player chooses pure strategy \(j\) is $$ \sum_{i=1}^{3} P_{i j} y_i~. $$ To find an optimal mixed strategy for the row-player we wish to minimize the expected maximum payout to the column player assuming the column players uses a pure strategy $$ \min_{0 \le y,~ \textrm{e}^T y = 1} \max_{i} \sum_{j=1}^{3} P_{i j}y_i ~.$$ Which is possible to frame as the dual linear program $$ \begin{aligned} \textrm{minimize }~~~~ & \eta \cr \textrm{subject to }~~~~ & \eta \textrm{e} \ge P^T y, \cr & \textrm{e}^T y = 1, \cr & 0 \le y~. \end{aligned} $$ Again, we will combine the minimum payout with the probabilities to define convert this linear program into the standard form: $$ \begin{aligned} \textrm{minimize }~~~~ & [1~~~~~~~~ 0] {\eta \brack y} \cr \textrm{subject to }~~~~ & [-\textrm{e}~~ P^T] {\eta \brack y} \le 0, \cr & [0~~~~~~ \textrm{e}^T] {\eta \brack y} = 1, \cr & 0 \le y~. \end{aligned} $$


  1. Game Theory: A non-technical introduction, M. D. Davis ↩︎

  2. Introduction to Game Theory: Matrix Games and Lagrangian Duality, J. Burke ↩︎