I couldn't avoid pasting the picture above remembering the "A Beautiful Mind" story. :)
But before I paste the code let's do some intro, as usual.
What is a Two-Person Zero Sum Game and what is Nash Equilibrium?
I think everything in life can be seen as a game: Interactions over the internet, the economy, marketing, warfare, even the search for love can be seen as a game where we count with competitors (players) who are held by a set of constraints (rules) and who try to optimize certain value of profit (players try to minimize loss and optimize gain) by taking a series of possible actions (by playing a strategy). As in life, many games don't necessarily have a winner or a loser, and not all games provide a player with a dominant strategy to guarantee maximum gain.
In a game, each player can play differently every time. Each possible way that a player can play a game is called a strategy. In a Two-Person game there are only two players, and each player can play only one of an assigned number of possible strategies. Depending on the pair of strategies selected by each player there is a gain or loss related to each one of the two players. In a Two-Person Zero-Sum Game, when players A and B play a given pair of strategies s and s', we'll have that A's profit = B's loss and vise-versa: B' profit = A's loss.
Let's also define for any player P that P's profit = -(P's loss)
As shown below, we can then associate each on of the possible pairs of strategies that the pair of players can select with a given value, and we can organize this information in a matrix as shown below:
There we have two players, P and Q.
P can select one of two strategies: p1 or p2
Q can select one of three strategies: q1, q2, and q3.
In the matrix A we have the profit that P would receive given a pair of selected strategies. Since this is a zero sum game, then as well as each entry in the matrix represents P's profit, it also represents Q's loss, and so the negative of the entry's value represents Q's gain.
Each time they play, P selects a strategy and Q another one, they don't know which strategy the opposite players has played until they see the outcome.
Suppose that P and Q play this game many many times, and they both want to amass as much profit as they can. How should each one of them play? Clearly if P plays strategy p1 all the time then player Q will catch up to that and play strategy q2, hence they'll want to play randomly to avoid allowing the opposite player to learn about one's tendencies and preferences. They'll also want to play some strategies with more probability than others since the expected profit associated with some strategies will be higher than others.
What should each player do? They don't know which strategy will the opposite player take, so how can they try to maximize the total amount of gain as much as possible? How should a player distribute the probabilities of playing each strategy to try to maximize payoff?
These games in which each player can set a probability distribution over an assigned set of possible strategies is called a mixed strategy.
A strategy vector for a given number of players is said to be a Nash equilibrium if no player can individually switch from the strategy (s)he has selected to another strategy and improve his or her payoff, assuming that the other players stick to their strategies.
Well, John Nash proved that the game we've described always has a Nash equilibrium.
For the above Two-Person Zero-Sum Game, the best each player can do is devise a probability distribution for his/her strategies to try to maximize his or her payoff regardless of the strategy played by the opposite player. Each player can find this optimal probability distribution by using linear programming. Once each player has found his/her optimal probability distribution, these probability distribution form a Nash equilibrium, and no player can devise a different probability distribution for the strategies and improve his or her payoff if the other player sticks to his/her optimal distribution. :)
So, given a matrix representing the profit associated with a player P's payoff for each pair of strategies selected by him and his opponent player Q, as shown below, devise for each player his/her optimal probability distribution.
Player P can run the following linear program:
maximize: v
subject to: (p*A)'s jth entry ≥ v for all j
and p1+p2+...+pm = 1 with p1,p2,...,pm ≥ 0
Player Q can either do the same program with the transpose of A, or run a dual version of this program where he/she tries to minimize v and all the inequalities are less-than inequalities rather than greater-than.
So... time for the code:
In my previous post I presented code that solves a linear program by using the simplex method. Today I present code that takes a matrix as A, and creates a standard form version of it which is then passed to the functions that I posted on my immediate previous post.
So here it goes:
------------------------------------------
vector<vector<double>> Transpose(vector<vector<double>> M)
{
vector<vector<double>> T;
int mNumRows = M.size();
int mNumCols = M[0].size();
for(int mCol = 0; mCol < mNumCols; mCol++)
{
vector<double> tRow;
for(int mRow = 0; mRow < mNumRows; mRow++)
{
tRow.push_back(M[mRow][mCol]);
}
T.push_back(tRow);
}
return T;
return M;
}
vector<vector<double>> SetLinearProgram(vector<vector<double>> A)
{
vector<double> maxFunc;
vector<double> b;
vector<vector<double>> T = Transpose(A);
for(int iRow = 0; iRow < T.size(); iRow++)
{
for(int iCol = 0; iCol < T[iRow].size(); iCol++)
{
T[iRow][iCol] *= -1.0;
}
T[iRow].push_back( 1.0 );
T[iRow].push_back( -1.0 );
}
for(int iRow = 0; iRow < T.size(); iRow++)
{
b.push_back( 0.0 );
}
int rowSize = T[0].size();
vector<double> rowEq1(rowSize, 1.0);
rowEq1[rowSize - 2] = rowEq1[rowSize - 1] = 0.0;
T.push_back( rowEq1 );
b.push_back( 1.0 );
vector<double> rowEq2(rowSize, -1.0);
rowEq2[rowSize - 2] = rowEq2[rowSize - 1] = 0.0;
T.push_back( rowEq2 );
b.push_back( -1.0 );
for(int i = 0; i < rowSize; i++)
{
if( i < rowSize - 2 )
maxFunc.push_back(1.0);
else if( i < rowSize - 1 )
maxFunc.push_back(1.0);
else
maxFunc.push_back(-1.0);
}
return SetSimplex(maxFunc, T, b);
}
void PrepareRun(vector<vector<double>> A)
{
vector<vector<double>> simplex = SetLinearProgram(A);
double max;
vector<double> result = DoSimplex(simplex, max);
int size = result.size();
if( !size )
{
printf("No optimal solution exists\n----------------------\n");
return;
}
printf("Result: Max = %f\n", result[size - 2] - result[size - 1]);
for(int i = 0; i < result.size() - 2; i++)
{
printf("x%d = %f ; ", i, result[i]);
}
printf("\n----------------------\n");
}
------------------------------------------
.
So as you can see, PrepareRun takes the matrix and takes care of calling the right order of of functions to run the linear program using the simplex method.
Code examples to run:
------------------------------------------
void Run1()
{
vector<vector<double>> A;
A.push_back(commaSeparatedStringToIntVector(" 1.0 , 2.0 , 3.0 , -1.0 , 1.5"));
A.push_back(commaSeparatedStringToIntVector("-2.0 , -1.5 , -1.0 , 1.0 , 3.0"));
A.push_back(commaSeparatedStringToIntVector(" 0.0 , 1.0 , 0.5 , -2.0 , 2.5"));
A.push_back(commaSeparatedStringToIntVector(" 2.0 , -1.0 , 0.5 , -3.0 , -2.5"));
A.push_back(commaSeparatedStringToIntVector(" 1.5 , 2.0 , -1.5 , 2.0 , 3.5"));
PrepareRun(A);
}
void Run2()
{
vector<vector<double>> A;
A.push_back(commaSeparatedStringToIntVector(" 1.0 , 2.0 "));
A.push_back(commaSeparatedStringToIntVector(" 2.0 , 1.0 "));
PrepareRun(A);
}
void Run3()
{
vector<vector<double>> A;
A.push_back(commaSeparatedStringToIntVector(" 30.0 , -10.0 , 20.0 "));
A.push_back(commaSeparatedStringToIntVector(" 10.0 , 20.0 , -20.0 "));
PrepareRun(A);
}
int main ()
{
Run1();
Run2();
Run3();
}
------------------------------------------
with output:
конец :)