Games & Equilibria — AI Encyclopedia

1.1

Games, players, strategies & payoffs

A game in the sense of this volume is not a pastime; it is a model of interaction between agents whose outcomes are intertwined. The minimal data is a triple. There is a set of players $N = \{1, 2, \ldots, n\}$. Each player $i$ has a set of strategies $S_i$ — the actions available to them. And each player has a payoff function $u_i$ that assigns a real number to every combination of choices, one from each player:

EQ G1.1 — NORMAL-FORM GAME $$ \Gamma = \big(N,\; \{S_i\}_{i \in N},\; \{u_i\}_{i \in N}\big), \qquad u_i : S_1 \times S_2 \times \cdots \times S_n \to \mathbb{R} $$

A choice of one strategy per player is a strategy profile $s = (s_1, \ldots, s_n)$. The crucial feature — the one that breaks ordinary optimization — is that $u_i$ depends on the whole profile, not just on $s_i$. Player $i$ controls only the $i$-th coordinate; the rest is chosen by others. We write $s = (s_i, s_{-i})$, splitting player $i$'s move from everyone else's $s_{-i}$.

For two players with finitely many actions, the game is a payoff matrix: rows are player 1's strategies, columns are player 2's, and each cell holds an ordered pair $(u_1, u_2)$. The canonical example is the Prisoner's Dilemma. Two suspects each choose to Cooperate (stay silent) or Defect (betray). Higher numbers are better; the cell entries are (row payoff, column payoff):

row ↓ / col →	Cooperate	Defect
Cooperate	(2, 2)	(0, 3)
Defect	(3, 0)	(1, 1)

A central modeling assumption runs through everything below: players are rational (each maximizes their own payoff) and this rationality is common knowledge (each knows the others are rational, knows that they know it, and so on). This is a strong, often unrealistic idealization — real humans deviate systematically, and behavioral game theory exists precisely to map those deviations. We will be honest about where the idealization bites. But it is the assumption that gives the predictions their teeth.

The tool that turns this raw data into prediction is the best response. Holding everyone else's choices $s_{-i}$ fixed, player $i$'s best responses are the strategies that maximize their payoff:

EQ G1.2 — BEST RESPONSE $$ \mathrm{BR}_i(s_{-i}) \;=\; \operatorname*{arg\,max}_{s_i \in S_i} \; u_i(s_i,\, s_{-i}) $$

A set, not a single point — ties are allowed. Almost every equilibrium concept in game theory is a fixed point of mutual best response: a profile in which everyone is simultaneously best-responding to everyone else. The entire chapter is, in one sentence, the study of when such fixed points exist and how to find them.

In the Prisoner's Dilemma above, suppose the column player has chosen Cooperate. Apply EQ G1.2: what is the row player's payoff $u_1$ when they play their best response to a cooperating opponent?

Holding the column at Cooperate, row compares Cooperate (payoff $2$) against Defect (payoff $3$). The best response is Defect, paying $u_1 = $ 3 — the "temptation" payoff. This is exactly why mutual cooperation is unstable: from $(2,2)$, a unilateral defection jumps you to $3$.

1.2

Dominant strategies & iterated elimination

Sometimes a strategy is good no matter what anyone else does. Strategy $s_i$ strictly dominates $s_i'$ when it pays more against every possible choice of the opponents:

EQ G1.3 — STRICT DOMINANCE $$ u_i(s_i,\, s_{-i}) \;>\; u_i(s_i',\, s_{-i}) \qquad \text{for every } s_{-i} \in S_{-i} $$

Replace the strict $>$ with $\ge$ (strict somewhere) and you get weak dominance. A rational player never plays a strictly dominated strategy — it is beaten regardless of what the world does, so reasoning about the opponent is unnecessary. This is the rare corner of game theory where a player's optimal move requires no belief about the others.

In the Prisoner's Dilemma, Defect strictly dominates Cooperate for both players: against a Cooperating opponent, $3 > 2$; against a Defecting opponent, $1 > 0$. Each player has a strictly dominant strategy — Defect — so the predicted outcome is mutual defection, paying $(1, 1)$. And here is the dilemma's sting: both would have preferred $(2, 2)$, but $(2, 2)$ is not stable, because each could unilaterally jump to $3$ by defecting. Individual rationality drives the pair to a jointly worse outcome. This single fact underwrites everything from arms races to tragedy-of-the-commons depletion to why multi-agent AI systems trained only on self-interest can converge on collectively destructive policies.

Most games have no dominant strategy. But dominance still buys leverage through iterated elimination of strictly dominated strategies (IESDS): delete a dominated strategy, and in the smaller game some other strategy may now be dominated, so delete that too, and repeat. With strict dominance the order of deletion does not change the result. If the process leaves exactly one strategy per player, the game is dominance-solvable and we have a prediction without ever invoking equilibrium.

INSTRUMENT G1.1 — PAYOFF-MATRIX EXPLOREREDIT A 2×2 GAME · FIND THE NASH EQUILIBRIA

EACH CELL = (ROW PAYOFF , COLUMN PAYOFF) · EDIT ANY NUMBER · UNDERLINED = A BEST RESPONSE · MINT CELL = PURE NASH

PRESET GAME

PURE NASH EQUILIBRIA

—

ROW DOMINANT STRATEGY

—

COLUMN DOMINANT STRATEGY

—

A pure Nash equilibrium is a cell where both numbers are underlined — row is best-responding in its column and column is best-responding in its row simultaneously. Prisoner's Dilemma has one (mutual defect); Coordination has two (both pure equilibria are stable but uncoordinated play is not); Matching Pennies has none — the underlines chase each other around the matrix, which is exactly why §1.5 needs randomization. Edit any payoff and watch the equilibria move.

PYTHON · RUNNABLE IN-BROWSER

# Pure Nash equilibria of a 2x2 game by best response (EQ G1.2)
import numpy as np
# Prisoner's Dilemma, "higher is better". A = row payoffs, B = column payoffs.
A = np.array([[2, 0],      # row plays Cooperate
              [3, 1]])     # row plays Defect
B = np.array([[2, 3],      # column payoffs, same cell layout
              [0, 1]])
acts = ["Cooperate", "Defect"]

# A cell (i,j) is Nash if i maximizes row's payoff in column j
# AND j maximizes column's payoff in row i.
row_br = (A == A.max(axis=0, keepdims=True))   # best row responses per column
col_br = (B == B.max(axis=1, keepdims=True))   # best col responses per row
nash   = row_br & col_br

print("row best-response mask (per column):\n", row_br.astype(int))
print("col best-response mask (per row):\n",    col_br.astype(int))
for i in range(2):
    for j in range(2):
        if nash[i, j]:
            print(f"PURE NASH -> (row={acts[i]}, col={acts[j]}), "
                  f"payoffs=({A[i,j]}, {B[i,j]})")

edits are live — break it on purpose

1.3

Nash equilibrium

Dominance handles only the easy games. The concept that handles all of them — John Nash's 1950 contribution, and the reason game theory became the lingua franca of economics, biology, and multi-agent AI — is a notion of mutual stability. A strategy profile $s^\star$ is a Nash equilibrium when no single player can improve their payoff by unilaterally changing their own strategy, holding everyone else's fixed:

EQ G1.4 — NASH EQUILIBRIUM $$ u_i\big(s_i^\star,\, s_{-i}^\star\big) \;\ge\; u_i\big(s_i,\, s_{-i}^\star\big) \qquad \text{for every player } i \text{ and every } s_i \in S_i $$

Equivalently, every player is best-responding to everyone else at once: $s_i^\star \in \mathrm{BR}_i(s_{-i}^\star)$ for all $i$. The word "unilaterally" is load-bearing — equilibrium says nothing about coordinated deviations by coalitions (that is cooperative game theory, next chapter). It is a statement about no profitable solo move, and that is exactly why it can be self-enforcing without any contract.

Read the definition as a test, not a recipe: given a candidate profile, you check it by asking each player in turn, "could you do better by switching, assuming the others stand pat?" If every answer is no, it is an equilibrium. The Prisoner's Dilemma's $(D, D)$ passes: from payoff $1$, unilaterally cooperating drops you to $0$. The cooperative $(C, C)$ fails: from $2$, unilaterally defecting jumps you to $3$.

Two cautions the textbooks insist on, and rightly. First, equilibria need not be unique — coordination games have several, and the theory alone does not say which one rational players will land on (this is the equilibrium selection problem, genuinely unsettled). Second, equilibrium is not optimality: the Prisoner's Dilemma equilibrium is Pareto-dominated by mutual cooperation — every player prefers the non-equilibrium outcome. Nash equilibrium predicts what self-interested rational agents will do, not what is collectively best. Conflating the two is the most common error in applying the concept.

Nash's theorem — proved with the Kakutani fixed-point theorem — guarantees that every finite game has at least one equilibrium, provided we allow mixed strategies (randomization over actions, §1.5). Existence is the deep result; the Prisoner's Dilemma happens to have a pure one, but games like Matching Pennies have an equilibrium only once randomization is permitted.

By the definition in EQ G1.4: at a Nash equilibrium, no player can increase their own payoff by deviating unilaterally (changing only their own strategy while everyone else holds fixed). Is this statement true or false? (Answer "true" or "false".)

This is the definition itself. EQ G1.4 states $u_i(s_i^\star, s_{-i}^\star) \ge u_i(s_i, s_{-i}^\star)$ for every player $i$ and every alternative $s_i$ — i.e., no unilateral deviation pays more. So the statement is true. (Note the equilibrium says nothing about coordinated multi-player deviations, only solo ones.)

1.4

Zero-sum games & minimax

A two-player zero-sum game is pure conflict: whatever one player wins, the other loses, so $u_1 + u_2 = 0$ in every cell. We can then describe the whole game by a single matrix $A$, where $A_{ij}$ is the row player's payoff (the column player's is $-A_{ij}$). The row player maximizes; the column player minimizes. This is the setting von Neumann solved in 1928, decades before the general Nash concept — and it is far better behaved.

The conservative move for the row player is to choose the strategy whose worst case is best — the maximin. The column player symmetrically picks the strategy whose worst case (from their side) is best — the minimax:

EQ G1.5 — MAXIMIN AND MINIMAX $$ \underline{v} = \max_{i} \min_{j} A_{ij} \;\le\; \overline{v} = \min_{j} \max_{i} A_{ij} $$

$\underline{v}$ is the most the row player can guarantee regardless of the opponent; $\overline{v}$ is the most the column player can be forced to concede. The inequality $\underline{v} \le \overline{v}$ always holds (the second mover never does worse). When the two coincide at a single cell, that cell is a saddle point — a pure-strategy equilibrium that is also the value of the game.

Von Neumann's Minimax Theorem is the crown jewel: in any finite two-player zero-sum game, once mixed strategies are allowed, the maximin and minimax are always equal. Their common value $v$ is the value of the game, and the optimal mixed strategies are interchangeable and worst-case optimal:

EQ G1.6 — THE MINIMAX THEOREM $$ \max_{p \in \Delta(S_1)} \min_{q \in \Delta(S_2)} \; p^{\top} A\, q \;=\; \min_{q \in \Delta(S_2)} \max_{p \in \Delta(S_1)} \; p^{\top} A\, q \;=\; v $$

$p$ and $q$ are probability distributions over the row and column actions ($\Delta$ is the simplex). The order of "max then min" no longer matters — a property that fails for general-sum games. This is why zero-sum is uniquely tractable: a unique value, no equilibrium-selection ambiguity, and the optimal strategy is computable by linear programming. It is the mathematical backbone of self-play in AlphaZero-style systems and of robust/adversarial training, where the "opponent" is a worst-case perturbation.

INSTRUMENT G1.2 — MINIMAX SOLVER2×2 ZERO-SUM · ROW MAXIMIZES, COLUMN MINIMIZES · EQ G1.5–G1.6

ROW-PAYOFF MATRIX A (COLUMN GETS −A) · EDIT ANY CELL

PRESET

MAXIMIN v (PURE)

—

MINIMAX v (PURE)

—

SADDLE POINT?

—

ROW MIX p* (TOP, BOTTOM)

—

COLUMN MIX q* (LEFT, RIGHT)

—

VALUE OF GAME v*

—

When the maximin and minimax agree on a single cell, that pure cell is a saddle point and you are done — no randomization needed ("HAS A SADDLE"). When they disagree (Matching Pennies: maximin $-1$, minimax $+1$), there is no pure equilibrium and the solver falls back to the closed-form mixed solution of EQ G1.7. The value sits between the pure maximin and minimax — exactly the gap that randomization closes.

PYTHON · RUNNABLE IN-BROWSER

# Solve a 2x2 zero-sum game's mixed strategy + value (EQ G1.6-G1.7)
import numpy as np
A = np.array([[ 1.0, -1.0],     # Matching Pennies, row payoffs
              [-1.0,  1.0]])

# First check for a pure saddle point.
maximin = A.min(axis=1).max()   # best worst-case row
minimax = A.max(axis=0).min()   # best worst-case column
print(f"pure maximin = {maximin:+.3f},  pure minimax = {minimax:+.3f}")

if np.isclose(maximin, minimax):
    print("saddle point exists -> pure value =", maximin)
else:
    a, b, c, d = A.ravel()
    denom = a - b - c + d        # EQ G1.7 denominator
    p = (d - c) / denom          # P(row plays top)
    q = (d - b) / denom          # P(col plays left)
    v = (a*d - b*c) / denom      # value of the game
    print(f"no saddle -> mix needed")
    print(f"row mix p* = ({p:.3f}, {1-p:.3f})")
    print(f"col mix q* = ({q:.3f}, {1-q:.3f})")
    print(f"value v*   = {v:+.3f}")
    # sanity: with these mixes the column is indifferent (both columns equal)
    col_payoffs = np.array([p, 1-p]) @ A
    print("row's mix makes columns equal:", np.round(col_payoffs, 6))

edits are live — break it on purpose

1.5

Mixed strategies

Matching Pennies has no pure equilibrium — any pure choice you make, the opponent can exploit. The escape is to randomize. A mixed strategy for player $i$ is a probability distribution $\sigma_i$ over their actions; a pure strategy is the degenerate case that puts all mass on one action. Payoffs become expected payoffs, and the strategy space is now the simplex $\Delta(S_i)$ — for two actions, just the line segment of probabilities $(p, 1-p)$.

The key to solving for a mixed equilibrium is the indifference principle: if a player is randomizing between several actions in equilibrium, every action they put positive weight on must yield the same expected payoff. Otherwise they would shift all their probability to the better one — so the opponent must mix precisely so as to leave them indifferent. This flips the intuition inside out: your mixing probabilities are pinned down by making the other player indifferent, not yourself.

For a 2×2 zero-sum game with row-payoff matrix $A = \left(\begin{smallmatrix} a & b \\ c & d \end{smallmatrix}\right)$ and no saddle point, the indifference conditions give closed forms. The row player mixes with probability $p$ on the top row so that the column player's two columns pay equally; the column player mixes with $q$ on the left column likewise:

EQ G1.7 — 2×2 ZERO-SUM MIXED SOLUTION $$ p^\star = \frac{d - c}{a - b - c + d}, \qquad q^\star = \frac{d - b}{a - b - c + d}, \qquad v = \frac{a\,d - b\,c}{a - b - c + d} $$

$p^\star$ is the probability the row player puts on the top row; $q^\star$ the probability the column player puts on the left column; $v$ the value of the game. The shared denominator $a - b - c + d$ is nonzero precisely when no saddle point exists. For Matching Pennies ($a = d = 1,\; b = c = -1$): denominator $= 1 - (-1) - (-1) + 1 = 4$, so $p^\star = (1-(-1))/4 = 0.5$, $q^\star = 0.5$, and $v = (1 - 1)/4 = 0$. Each side plays heads and tails with equal probability, and the game is fair.

The mixed equilibrium has a striking robustness: at $(p^\star, q^\star)$, neither player can be exploited, because each has made the other indifferent across all their options. There is nothing to grab. This is the precise sense in which a randomized strategy can be safer than any deterministic one — an idea that reappears, dressed differently, in adversarial robustness and in the stochastic policies of reinforcement learning.

INSTRUMENT G1.3 — MIXED-STRATEGY SIMPLEXMATCHING PENNIES · EXPECTED PAYOFF vs ROW MIX p · EQ G1.7

ROW MIX p = P(top row) 0.50

PAYOFF IF COL PLAYS LEFT

—

PAYOFF IF COL PLAYS RIGHT

—

ROW'S GUARANTEE (min)

—

The two lines are the row player's expected payoff against a pure-Left and a pure-Right column, as a function of their own mix $p$. The column player will always exploit you down to the lower of the two lines — your guarantee is the mint curve (the lower envelope). It peaks exactly where the lines cross, at $p^\star = 0.5$, giving value $v = 0$. Drag $p$ away from $0.5$ and watch your guarantee fall: any tilt hands the opponent something to exploit. The crossing point is the indifference principle made visible.

In Matching Pennies (row-payoff matrix $\left(\begin{smallmatrix} +1 & -1 \\ -1 & +1 \end{smallmatrix}\right)$), what probability does each player assign to each of their two actions at the unique mixed Nash equilibrium? (Give the probability of a single action, e.g. heads.)

By EQ G1.7 with $a=d=1,\ b=c=-1$: denominator $= 1-(-1)-(-1)+1 = 4$, so $p^\star = (d-c)/4 = (1-(-1))/4 = 2/4 = $ 0.5, and $q^\star = 0.5$ by the same arithmetic. Each action — heads and tails — is played with probability 0.5. Any deviation from 0.5 lets the opponent tilt their own mix to exploit the imbalance, so 0.5 is the only unexploitable choice.

A 2×2 zero-sum game has row-payoff matrix $\left(\begin{smallmatrix} a & b \\ c & d \end{smallmatrix}\right) = \left(\begin{smallmatrix} 4 & 0 \\ 1 & 3 \end{smallmatrix}\right)$ and no saddle point. Using EQ G1.7, what is the row player's equilibrium probability $p^\star$ of playing the top row? (Give a decimal.)

Denominator $= a - b - c + d = 4 - 0 - 1 + 3 = 6$. Then $p^\star = \dfrac{d - c}{6} = \dfrac{3 - 1}{6} = \dfrac{2}{6} = \dfrac{1}{3} \approx $ 0.333. (As a check, the value is $v = \dfrac{ad - bc}{6} = \dfrac{12 - 0}{6} = 2$, comfortably between the pure maximin of $1$ and minimax of $3$.)

One-shot games answer "what is stable?" — but life is repeated, and repetition changes everything. When the Prisoner's Dilemma is played again and again, cooperation can become rational through the shadow of the future, and entire strategies (tit-for-tat, grim trigger) emerge that have no meaning in a single round. Chapter 02 takes up repeated games, the Folk Theorem, and the cooperative side of game theory.

1.R

References

von Neumann, J. & Morgenstern, O. (1944). Theory of Games and Economic Behavior. Princeton University Press — the founding text; normal-form games (EQ G1.1), the minimax theorem (EQ G1.6), and expected-utility theory.
Nash, J. F. (1950). Equilibrium Points in n-Person Games. PNAS 36(1), 48–49 — the existence theorem for the equilibrium concept of EQ G1.4 in general finite games.
Nash, J. F. (1951). Non-Cooperative Games. Annals of Mathematics 54(2), 286–295 — the full development of non-cooperative equilibrium, dominance, and the proof via Kakutani's fixed-point theorem.
von Neumann, J. (1928). Zur Theorie der Gesellschaftsspiele. Mathematische Annalen 100 — the original minimax theorem for two-player zero-sum games (§1.4).
Osborne, M. J. & Rubinstein, A. (1994). A Course in Game Theory. MIT Press — standard graduate reference for dominance, IESDS, Nash equilibrium, and mixed strategies as presented in §§1.2–1.5.
Easley, D. & Kleinberg, J. (2010). Networks, Crowds, and Markets. Cambridge University Press (Ch. 6) — an accessible, freely available treatment of best response, dominant strategies, and equilibrium used to frame this chapter.