Wednesday, December 29, 2010

Finding Nash Equilibrium in Two-Person Zero-Sum Games. C++ Implementation



Today we are going to do a small trip into Game Theory land. I am going to provide code that can be used to find the Nash Equilibrium of a Two-Person Zero Sum game.
I couldn't avoid  pasting the picture above remembering the "A Beautiful Mind" story.  :)

But before I paste the code let's do some intro, as usual.
What is a Two-Person Zero Sum Game and what is Nash Equilibrium?

I think everything in life can be seen as a game: Interactions over the internet, the economy, marketing, warfare, even the search for love can be seen as a game where we count with competitors (players) who are held by a  set of constraints (rules) and who try to optimize certain value of profit (players try to minimize loss and optimize gain) by taking a series of possible actions (by playing a strategy). As in life, many games don't necessarily have a winner or a loser, and not all games provide a player with a dominant strategy to guarantee maximum gain. 

In a game, each player can play differently every time. Each possible way that a player can play a game is called a strategy. In a Two-Person game there are only two players, and each player can play only one of an assigned number of possible strategies. Depending on the pair of strategies selected by each player there is a gain or loss related to each one of the two players. In a Two-Person Zero-Sum Game, when players A and B play a given pair of strategies s and s', we'll have that A's profit = B's loss and vise-versa: B' profit = A's loss.
Let's also define for any player P that P's profit = -(P's loss)

As shown below, we can then associate each on of the possible pairs of strategies that the pair of players can select with a given value, and we can organize this information in a matrix as shown below:

There we have two players, P and Q. 
P can select one of two strategies: p1 or p2
Q can select one of three strategies: q1, q2, and q3.
In the matrix A we have the profit that P would receive given a pair of selected strategies. Since this is a zero sum game, then as well as each entry in the matrix represents P's profit, it also represents Q's loss, and so the negative of the entry's value represents Q's gain.
Each time they play, P selects a strategy and Q another one, they don't know which strategy the opposite players has played until they see the outcome.

Suppose that P and Q play this game many many times, and they both want to amass as much profit as they can. How should each one of them play? Clearly if P plays strategy p1 all the time then player Q will catch up to that and play strategy q2, hence they'll want to play randomly to avoid allowing the opposite player to learn about one's tendencies and preferences. They'll also want to play some strategies with more probability than others since the expected profit associated with some strategies will be higher than others.

What should each player do? They don't know which strategy will the opposite player take, so how can they try to maximize the total amount of gain as much as possible? How should a player distribute the probabilities of playing each strategy to try to maximize payoff?

These games in which each player can set a probability distribution over an assigned set of possible strategies is called a mixed strategy.

A strategy vector for a given number of players is said to be a Nash equilibrium if no player can individually switch from the strategy (s)he has selected to another strategy and improve his or her payoff, assuming that the other players stick to their strategies.

Well, John Nash proved that the game we've described always has a Nash equilibrium.
For the above Two-Person Zero-Sum Game, the best each player can do is devise a probability distribution for his/her strategies to try to maximize his or her payoff regardless of the strategy played by the opposite player. Each player can find this optimal probability distribution by using linear programming. Once each player has found his/her optimal probability distribution, these probability distribution form a Nash equilibrium, and no player can devise a different probability distribution for the strategies and improve his or her payoff if the other player sticks to his/her optimal distribution.  :)

So, given a matrix representing the profit associated with a player P's payoff for each pair of strategies selected by him and his opponent player Q, as shown below, devise for each player his/her optimal probability distribution.
Player P can run the following linear program:
maximize:  v
subject to:  (p*A)'s jth entry  v   for all j
                  and p1+p2+...+pm = 1 with p1,p2,...,pm  0

Player Q can either do the same program with the transpose of A, or run a dual version of this program where he/she tries to minimize v and all the inequalities are less-than inequalities rather than greater-than.

 So... time for the code:

In my previous post I presented code that solves a linear program by using the simplex method. Today I present code that takes a matrix as A, and creates a standard form version of it which is then passed to the functions that I posted on my immediate previous post.

So here it goes:
------------------------------------------
vector<vector<double>> Transpose(vector<vector<double>> M)
{
vector<vector<double>> T;

int mNumRows = M.size();
int mNumCols = M[0].size();

for(int mCol = 0; mCol < mNumCols; mCol++)
{
vector<double> tRow;
for(int mRow = 0; mRow < mNumRows; mRow++)
{
tRow.push_back(M[mRow][mCol]);
}

T.push_back(tRow);
}

return T;
return M;
}

vector<vector<double>> SetLinearProgram(vector<vector<double>> A)
{
vector<double> maxFunc;
vector<double> b;

vector<vector<double>> T = Transpose(A);

for(int iRow = 0; iRow < T.size(); iRow++)
{
for(int iCol = 0; iCol < T[iRow].size(); iCol++)
{
T[iRow][iCol] *= -1.0;
}
T[iRow].push_back(  1.0 );
T[iRow].push_back( -1.0 );
}

for(int iRow = 0; iRow < T.size(); iRow++)
{
b.push_back( 0.0 );
}

int rowSize = T[0].size();

vector<double> rowEq1(rowSize, 1.0);
rowEq1[rowSize - 2] = rowEq1[rowSize - 1] = 0.0;
T.push_back( rowEq1 );
b.push_back( 1.0 );

vector<double> rowEq2(rowSize, -1.0);
rowEq2[rowSize - 2] = rowEq2[rowSize - 1] = 0.0;
T.push_back( rowEq2 );
b.push_back( -1.0 );

for(int i = 0; i < rowSize; i++)
{
if( i < rowSize - 2 )
maxFunc.push_back(1.0);
else if( i < rowSize - 1 )
maxFunc.push_back(1.0);
else
maxFunc.push_back(-1.0);
}

return SetSimplex(maxFunc, T, b);
}


void PrepareRun(vector<vector<double>> A)
{
vector<vector<double>> simplex = SetLinearProgram(A);
double max;
vector<double> result = DoSimplex(simplex, max);
int size = result.size();
if( !size )
{
printf("No optimal solution exists\n----------------------\n");
return;
}
printf("Result: Max = %f\n", result[size - 2] - result[size - 1]);
for(int i = 0; i < result.size() - 2; i++)
{
printf("x%d = %f ; ", i, result[i]);
}
printf("\n----------------------\n");
}
------------------------------------------
.
So as you can see, PrepareRun takes the matrix and takes care of calling the right order of of functions to run the linear program using the simplex method.

Code examples to run:

------------------------------------------
void Run1()
{
vector<vector<double>> A;
A.push_back(commaSeparatedStringToIntVector(" 1.0 ,  2.0 ,  3.0 , -1.0 ,  1.5"));
A.push_back(commaSeparatedStringToIntVector("-2.0 , -1.5 , -1.0 ,  1.0 ,  3.0"));
A.push_back(commaSeparatedStringToIntVector(" 0.0 ,  1.0 ,  0.5 , -2.0 ,  2.5"));
A.push_back(commaSeparatedStringToIntVector(" 2.0 , -1.0 ,  0.5 , -3.0 , -2.5"));
A.push_back(commaSeparatedStringToIntVector(" 1.5 ,  2.0 , -1.5 ,  2.0 ,  3.5"));

PrepareRun(A);
}

void Run2()
{
vector<vector<double>> A;
A.push_back(commaSeparatedStringToIntVector(" 1.0 ,  2.0 "));
A.push_back(commaSeparatedStringToIntVector(" 2.0 ,  1.0 "));

PrepareRun(A);
}

void Run3()
{
vector<vector<double>> A;
A.push_back(commaSeparatedStringToIntVector(" 30.0 , -10.0 ,  20.0 "));
A.push_back(commaSeparatedStringToIntVector(" 10.0 ,  20.0 , -20.0 "));

PrepareRun(A);
}

int main ()
{
Run1();
Run2();
Run3();
}
------------------------------------------

with output:

конец  :)

Monday, December 27, 2010

Linear Programming: Simplex Algorithm. Implementation in C++

Hello. I am back from some nice vacations in Rio de Janeiro. Besides the great beaches and attractions to visit in Rio, I also visited the IMPA - INSTITUTO NACIONAL DE MATEMÁTICA PURA E APLICADA of Brasil (National Insitute of Pure and Applied Math). A friend of mine studies there and took the time to take me to meet such institution where many talented minds work in the field of math.


                                           SIMPLEX


Anyways, today I am going to post code that runs the Simplex algorithm to solve linear programs. I once wrote the algorithm in F# for fun but I lost the files so I wrote it again, this time in C++.


But first I am going to talk a little about the Simplex algorithm and linear programming.


Many problems that arise in everyday life take the form minimizing or maximizing an objective given limited resources and a set of competing constraints. Sometimes we can specify the objectives and constraints of these problems as linear functions, equalities, and inequalities. When we can specify the problem in such conditions, then we can solve it using a linear program. The power of linear programming allows us to solve a large number of hard problems. Today I present the Simplex algorithm, which is just one of many methods for solving linear programs. The Simplex algorithm does not run in polynomial time in the worst case, however, it is remarkably fast in practice for what I've read.


The simplex algorithm is used in the case where we posses a system of linear inequalities with the constraint that each variable must be non-negative, and we have some linear function which's value we want to maximize.


Example:



Maximize     p = (1/2)x + 3y + z + 4w 
subject to
              x + y + z + w  40
        -2x - y + z + w  10
                 -w + y  10


and                                 x,y,z,w ≥ 0


Here the linear function's value that we want to maximize is p, where 
p = (1/2)x + 3y + z + 4w 


And the linear inequalities are: 
  x + y + z + w <= 40    ,
 -2x - y + z + w <= 10  , and
 -w + y <= 10


Once a problem is described as shown above, it is very easy to solve it by running the Simplex method.


The format shown above is called standard form. Observe there we have a series of less-than inequalities where in each inequality all variables are located to the left side of the inequality. Observe also the constraint that each variable cannot be less than zero.


Not all linear programs are in standard form, but once a linear program is converted to standard form, the Simplex algorithm can take care of it.


The basics behind the simplex algorithm are as follows:
In a linear program in standard form with n variables we have that each inequality forms a half-space in n-dimensional space. The half-spaces created by all the inequalities create a n-dimensional convex polygon that we call a simplex. All the space inside the simplex is also called the feasibility region. Every point in n-dimensional space inside the feasibility region complies with the boundaries set by all the inequalities. The linear function which we want to maximize also forms a half-space when equated to a value. We want to find a value for this function such that the boundary of the half-space of the function intersects with the simplex. The largest of these values would be the optimal maximum value of the function, and hence the solution to the problem. The the boundary of the half-space created by this maximum-optimal-value equated to the linear function will always intersect with at least one vertex of the simplex because the simplex is convex. So all we have to do is find a vertex of the simplex with maximum value for this function. That is precisely what the simplex method does.


                           Below the code:
--------------------------------------------------------------


vector<vector<double>> SetSimplex( vector<double> maxFunction, 
  vector<vector<double>> A, 
  vector<double> b)
{
vector<vector<double>> simplex;


int numVariables = maxFunction.size();
int numEquations = A.size();
int numCols = numVariables + numEquations + 1 + 1;


for(int iRow = 0; iRow < numEquations; iRow++)
{
vector<double> row(numCols, 0);
for(int iCol = 0; iCol < numVariables; iCol++)
{
row[iCol] = A[iRow][iCol];
}
row[numVariables + iRow] = 1;
row[numCols - 1] = b[iRow];


simplex.push_back( row );
}


vector<double> lastRow(numCols, 0);
for(int iVar = 0; iVar < numVariables; iVar++)
{
lastRow[iVar] = 0 - maxFunction[iVar];
}
lastRow[numVariables + numEquations] = 1;
simplex.push_back(lastRow);


return simplex;
}





bool GetPivots(vector<vector<double>> simplex, int & pivotCol, int & pivotRow, bool & noSolution)
{
int numRows = simplex.size();
int numCols = simplex[0].size();
int numVariables = numCols - numRows - 1;


noSolution = false;


double min = 0;
for(int iCol = 0; iCol < numCols - 2; iCol++)
{
double value = simplex[numRows - 1][iCol];
if(value < min)
{
pivotCol = iCol;
min = value;
}
}


if(min == 0)
return false;


double minRatio = 0.0;
bool first = true;
for(int iRow = 0; iRow < numRows - 1; iRow++)
{
double value = simplex[iRow][pivotCol];

if(value > 0.0)
{
double colValue = simplex[iRow][numCols - 1];
double ratio = colValue / value;


if((first || ratio < minRatio) && ratio >= 0.0)
{
minRatio = ratio;
pivotRow = iRow;
first = false;
}
}
}


noSolution = first;
return !noSolution;
}


vector<double> DoSimplex(vector<vector<double>> simplex, double & max)
{
int pivotCol, pivotRow;
int numRows = simplex.size();
int numCols = simplex[0].size();


bool noSolution = false;
while( GetPivots(simplex, pivotCol, pivotRow, noSolution) )
{
double pivot = simplex[pivotRow][pivotCol];

for(int iCol = 0; iCol < numCols; iCol++)
{
simplex[pivotRow][iCol] /= pivot;
}


for(int iRow = 0; iRow < numRows; iRow++)
{
if(iRow != pivotRow)
{
double ratio =  -1 * simplex[iRow][pivotCol];
for(int iCol = 0; iCol < numCols; iCol++)
{
simplex[iRow][iCol] = simplex[pivotRow][iCol] * ratio + simplex[iRow][iCol];
}
}
}
}


if(noSolution)
{
vector<double> vec;
return vec;
}

//optimo!!!
max = simplex[numRows-1][numCols-1];
int numVariables = numCols - numRows - 1;
vector<double> x(numVariables, 0);

for(int iCol = 0; iCol < numVariables; iCol++)
{
bool isUnit = true;
bool first = true;
double value;
for(int j = 0; j < numRows; j++)
{
if(simplex[j][iCol] == 1.0 && first)
{
first = false;
value = simplex[j][numCols - 1];
}
else if(simplex[j][iCol] != 0.0)
{
isUnit = false;
}
}


if(isUnit && !first)
x[iCol] = value;
else
x[iCol] = 0.0;
}


return x;
}

The operation starts with the function SetSimplex, which takes three parameters:
- maxFunction : a vector with the values of the coefficients for each variable
                        Example: [ 2 , 3/2 , 1] for 2X + (3/2)Y + Z

- A                  : a matrix with the coefficient of the left sides of all the inequalities.
                        Example: [1,0,3]            X + 3Z  ≤ 10
                                     [2,1,0]     for  2X + Y   ≤  3
b                      : a vector with the values of the right side of the inequalities
                        Example: (10, 3)    for   X + 3Z  ≤ 10
                                                        2X + Y   ≤  3


After we call SetSimplex, we use the structure returned (which is the linear program in canonical form) by the method to call DoSimplex, which runs the algorithm.
So just take your standard form linear program and enter the parameters as explained above (and shown next below) and run the simplex method.


Now I present some code that shows how to use these functions:


First let me post a couple of functions that I use to aid myself in setting up the equations.



string cleanSpaces(string str)
{
string space = " ";
int pos;
while((pos = str.find(space)) != string::npos)
{
str = str.erase(pos, 1);
}
return str;
}


vector<double> commaSeparatedStringToDoubleVector(string str)
{
vector<double> vec;
while(str.length() > 0)
{
int pos = str.find(",");
if (pos!=string::npos)
{
string strNum = str.substr(0, pos);
strNum = cleanSpaces(strNum);
vec.push_back(atof(strNum.c_str()));
str = str.substr(pos+1);
}
else
{
string strNum = cleanSpaces(str.c_str());
vec.push_back(atof(strNum.c_str()));
break;
}
}
return vec;
}


Then below I set up the parameters for the simplex function in a couple of runs:


void Run1()
{
//     x1    x2     x3
string maxFuncStr = "8.0 , 10.0 , 7.0";
vector<double> maxFunc = commaSeparatedStringToDoubleVector(maxFuncStr);

vector<vector<double>> A;
                                 //         x1     x2    x3
A.push_back(commaSeparatedStringToDoubleVector("1.0 , 3.0 , 2.0"));
A.push_back(commaSeparatedStringToDoubleVector("1.0 , 5.0 , 1.0"));

             // b1     b2
string bStr = "10.0 , 8.0";
vector<double> b = commaSeparatedStringToDoubleVector(bStr);

vector<vector<double>> simplex = SetSimplex(maxFunc, A, b);

double max;
vector<double> result = DoSimplex(simplex, max);

printf("Result: Max = %f\n", max);
for(int i = 0; i < result.size(); i++)
{
printf("x%d = %f ; ", i, result[i]);
}
printf("\n----------------------\n");
}



void Run2()
{
//           x1    x2    x3    x4
string maxFuncStr = "0.5 , 3.0 , 1.0 , 4.0";
vector<double> maxFunc = commaSeparatedStringToDoubleVector(maxFuncStr);

vector<vector<double>> A;
                              //       x1     x2    x3    x4
A.push_back(commaSeparatedStringToDoubleVector(" 1.0 ,  1.0 , 1.0 ,  1.0"));
A.push_back(commaSeparatedStringToDoubleVector("-2.0 , -1.0 , 1.0 ,  1.0"));
A.push_back(commaSeparatedStringToDoubleVector(" 0.0 ,  1.0 , 0.0 , -1.0"));

// b1     b2     b3
string bStr = "40.0 , 10.0 , 10.0";
vector<double> b = commaSeparatedStringToDoubleVector(bStr);

vector<vector<double>> simplex = SetSimplex(maxFunc, A, b);

double max;
vector<double> result = DoSimplex(simplex, max);

printf("Result: Max = %f\n", max);
for(int i = 0; i < result.size(); i++)
{
printf("x%d = %f ; ", i, result[i]);
}
printf("\n----------------------\n");
}


int main ()
{
Run1();
Run2();
}




And here the output:

The field of application of this algorithm is truly enormous.


Endooo