* Your assessment is very important for improving the workof artificial intelligence, which forms the content of this project
Download The Minimax Theorem
Survey
Document related concepts
Jordan normal form wikipedia , lookup
Eigenvalues and eigenvectors wikipedia , lookup
Shapley–Folkman lemma wikipedia , lookup
Non-negative matrix factorization wikipedia , lookup
Vector space wikipedia , lookup
Singular-value decomposition wikipedia , lookup
Euclidean vector wikipedia , lookup
Matrix multiplication wikipedia , lookup
Perron–Frobenius theorem wikipedia , lookup
Cayley–Hamilton theorem wikipedia , lookup
Laplace–Runge–Lenz vector wikipedia , lookup
Covariance and contravariance of vectors wikipedia , lookup
Gaussian elimination wikipedia , lookup
Transcript
A PROOF OF THE MINIMAX THEOREM ANTHONY MENDES > A probability vector is a vector x = x1 , · · · , xn ∈ Rm such that xi ≥ 0 and x1 + · · · + xm = 1. Let 1 denote the vector of all 1’s and 0 denote is the zero vector (the dimensions of these vectors will be clear from context). Then x is a probability vector if x≥0 x> 1 = 1> x = 1 and where the relation x ≥ y for vectors means that every entry in x is greater than or equal to the corresponding entry in y. Definition 1. Let A be an m × n matrix. Let v be the maximum number for which there is a probability vector x such that x> A ≥ v1> . The number v is the row value and a vector x which satisfies this inequality is an optimal row strategy. Similarly, let w be the minimum number for which there is a probability vector y such that Ay ≤ w1. The number w is the column value and a vector y which satisfies this inequality is an optimal column strategy. Theorem 1. There is a row value, column value, optimal row strategy, and optimal column strategy for every matrix A. Proof. Let C = { Ay : y is a probability vector}. Since the matrix multiplication Ay is a convex combination of the columns of A, the set C is the convex hull formed by the columns of A. Among other things, this implies C is a compact subset of Rm . The function f : C → R which selects the maximum component of a vector c ∈ C is a continuous function on a compact set and therefore attains its minimum. This minimum is the column value and a probability vector y for which f ( Ay) = w is an optimal column strategy. The existence proof for the row value and optimal row strategy follows similarly. Lemma 1. Let v be the row value and w be the column value of a matrix A. Then v ≤ w. Proof. If x is an optimal row strategy and y an optimal column strategy, then v = v1> y ≤ x> Ay ≤ x> w1 = w. Lemma 2. Let v be the row value and w be the column value of an m × n matrix A. Let 1 denote the m × n matrix with every entry equal to the number 1. Then the row value for ( A + k1) is v + k and the column value of ( A + k1) is equal to w + k. Proof. Since x> ( A + k1) = x> A + k1> , the entries in x> A and x> ( A + k1) only differ by the constant k. The statement about the rows follows because maximizing the minimum entry in one of these vectors is the same as maximizing the minimum entry in another. The statement about the columns follows similarly. After proving the next theorem in 1928, John von Neumann was quoted as saying “As far as I can see, there could be no theory of games without that theorem. I thought there was nothing worth publishing until the Minimax Theorem was proved.” Theorem 2 (The Minimax Theorem). The row value v and the column value w are equal for every m × n matrix A. 1 2 ANTHONY MENDES Proof. Without loss of generality, assume v = 0. Indeed, if v 6= 0, Lemma 2 allows us to consider the matrix A − v1 instead of A. Our goal is to show that w = 0. Lemma 1 tells us 0 ≤ w, so we will assume 0 < w and look for a contradiction. Let C = { Ay : y is a probability vector}. Just as in the proof of Theorem 1, C is a convex compact subset of Rm . For any c ∈ C, let Pc be the m × m diagonal matrix with diagonal entry equal to 1 if the ith coordinate of c is positive and 0 if not. Then Pc c is the vector c with all negative entries replaced with 0’s. Simple observations about this matrix include Pc > = Pc and k Pc xk2 ≤ kxk2 for all c ∈ C and x ∈ Rm . Furthermore, each c ∈ C must have at least one positive coordinate because otherwise we would have c ≤ 0, implying w ≤ 0. This means that Pc c 6= 0 for all c ∈ C. Let f : C → R be the function defined by f (c) = k Pc ck2 . This is a continuous function on a compact set > and therefore attains is minimum. Let a = a1 , . . . , am ∈ C minimize f (a). This means that for all c ∈ C, 0 < k Pa ak2 ≤ k Pc ck2 . (1) > For any c = c1 , . . . , cm ∈ C, select λ ∈ (0, 1) close enough to 1 such that 1. If ai > 0, then λai + (1 − λ)ci > 0 for all i, 2. If ai < 0, then λai + (1 − λ)ci < 0 for all i, and 3. 0 < 1 − λ2 k Pa ak2 − (1 − λ)2 kck2 . It is easier to see that such a choice for λ is possible when this last condition is written as (1 − λ) kck2 < (1 + λ) k Pa ak2 ; by taking λ close to 1, we can make the left hand side of the equation a number close to 0. The first two conditions on λ ensures that the ith diagonal entry of Pa and Pλa+(1−λ)c can only differ if ai = 0. Therefore Pa a = Pλa+(1−λ)c a and Pa Pλa+(1−λ)c = Pa . Using equation (1), 2 k Pa ak2 ≤ Pλa+(1−λ)c (λa + (1 − λ)c) > = λPa a + (1 − λ) Pλa+(1−λ)c c λPa a + (1 − λ) Pλa+(1−λ)c c 2 = λ2 k Pa ak2 + 2λ(1 − λ)a> Pa > Pλa+(1−λ)c c + (1 − λ)2 Pλa+(1−λ)c c ≤ λ2 k Pa ak2 + 2λ(1 − λ)a> Pa c + (1 − λ)2 kck2 Rewriting this, we find 1 − λ2 k Pa ak2 − (1 − λ)2 kck2 ≤ 2λ(1 − λ)a> Pa c. Comparing this with our last condition when choosing λ, we find a> Pa c > 0. Define x ∈ Rm to be the vector Pa a x= > . 1 Pa a By construction, x is a probability vector and x> c > 0 for any c ∈ C. This means that x> Ay > 0 for all probability vectors y. Therefore we have found a probability vector x for which x> A > 0. This, however, tells us that the row value v is greater than 0. We have found our contradiction. M ATHEMATICS D EPARTMENT, C ALIFORNIA P OLYTECHNIC S TATE U NIVERSITY, S AN L UIS O BISPO , C ALIFORNIA 93407. E-mail address: [email protected]