Two Plus Two Publishing LLC Two Plus Two Publishing LLC
 

Go Back   Two Plus Two Poker Forums > >

Notices

Programming Discussions about computer programming

Reply
 
Thread Tools Display Modes
Old 05-15-2014, 11:48 PM   #1
ibavly
2010 PSOP Champion
 
ibavly's Avatar
 
Join Date: Feb 2010
Location: Check out my gang listing
Posts: 22,977
CFR RPS code

Hi,

I'm new to programming and cfr, could you guys check the code, I'm expecting to get a different result.

The game is RPS where the opponent has an imbalanced strategy. He is throwing scissors 40%.

What should happen: The program learns to throw Rock 100%
What I expect to happen: The regrets will build up much faster for rock than for the other two, so over a large sample the other two options will tend towards zero but will still stay non-zero
What is actually happening: The result being outputted is {.3,.3,.3}, which is the NE but not what should be happening.

Code:
import java.util.Arrays;
import java.util.Random;

public class RPSTrainer
{
    // set values to each move and for total number of actions
    // the actual numbers assigned to these moves don't matter in this case since game is symettric
    public static final int ROCK = 0, PAPER = 1, SCISSORS = 2, NUM_ACTIONS = 3;
    // initiate randomizer
    public static final Random random = new Random();
    // create array to store regrets
    double[] regretSum = new double[NUM_ACTIONS],
    // create array to store temporary strategies
    strategy = new double[NUM_ACTIONS],
    // create array to store total average strategy
    strategySum = new double[NUM_ACTIONS],
    // define opponent's strategy for the non gto game
    oppStrategy = { 0.4, 0.3, 0.3 };
    
    // Main method to run the whole training session
    public static void main(String args[]) {
        RPSTrainer trainer = new RPSTrainer();
        trainer.train(1000000);
        System.out.println(Arrays.toString(trainer.getAverageStrategy()));
    }
    
    
    // returns an array with a defined strategy based on the number of regrets during training
    // for each action divides that action's regrets by the total number of regrets to determine frequency
    // if there are no regrets each action is given an equal liklihood
    private double[] getStrategy() {
        double normalizingSum = 0;
        for (int a = 0; a < NUM_ACTIONS; a++) {
            strategy[a] = regretSum[a] > 0 ? regretSum[a] : 0;
            normalizingSum += strategy[a];
        }
        for (int a = 0; a < NUM_ACTIONS; a++) {
            if (normalizingSum > 0) {
                strategy[a] /= normalizingSum;
            }
            else {
                strategy[a] = 1.0 / NUM_ACTIONS;
                strategySum[a] += strategy[a];
            }
        }
        return strategy;
    }
    
    // returns a number to represent the action chosen by Hero, as defined at the top of the code
    // randomizes a number, then runs through each option, stopping when the cumalitve prob
    // is greater than the random number
    public int getAction(double[] strategy) {
        double r = random.nextDouble();
        int a = 0;
        double cumulativeProbability = 0;
        while (a < NUM_ACTIONS - 1) {
            cumulativeProbability += strategy[a];
            if (r < cumulativeProbability) {
                break;
            }
            a++;
        }
        return a;
    }
    
    // runs a given number of iterations of training to generate regrets
    // after each iteration adds the regrets for that action to the sum of regrets
    public void train(int iterations) {
        double[] actionUtility = new double[NUM_ACTIONS];
        for (int i = 0; i < iterations; i++) {
            //Get regret-matched mixed-strategy actions
            double[] strategy = getStrategy();
            int myAction = getAction(strategy);
            int otherAction = getAction(oppStrategy);
            //Compute action utilities
            actionUtility[otherAction] = 0;
            actionUtility[otherAction == NUM_ACTIONS - 1 ? 0 : otherAction + 1] = 1;
            actionUtility[otherAction == 0 ? NUM_ACTIONS - 1 : otherAction - 1] = -1;
            //Accumulate action regrets
            for (int a = 0; a < NUM_ACTIONS; a++)
            {
                regretSum[a] += actionUtility[a] - actionUtility[myAction];
                
            }
        }
    }
    
    // returns the average strategy over all the iterations so far
    // normailzes the total number of regrets for all actions
    public double[] getAverageStrategy() {
        double[] avgStrategy = new double[NUM_ACTIONS];
        double normalizingSum = 0;
        for (int a = 0; a < NUM_ACTIONS; a++)
            normalizingSum += strategySum[a];
        for (int a = 0; a < NUM_ACTIONS; a++)
        if (normalizingSum > 0)
            avgStrategy[a] = strategySum[a] / normalizingSum;
        else {
            avgStrategy[a] = 1.0 / NUM_ACTIONS;
        }
        return avgStrategy;
    }
    
}
Can you figure out the mistake?
ibavly is offline   Reply With Quote
Old 05-16-2014, 05:17 AM   #2
Craggoo
culled
 
Craggoo's Avatar
 
Join Date: Sep 2006
Location: R.I.P. ItzPenzoo 12-09-11
Posts: 12,251
Re: CFR RPS code

Is this the year where everyone switches from "poker pro" to "computer programmer"?
Craggoo is offline   Reply With Quote
Old 05-16-2014, 08:37 AM   #3
nickthegeek
centurion
 
Join Date: Sep 2011
Posts: 193
Re: CFR RPS code

Quote:
Originally Posted by ibavly View Post
Hi,

I'm new to programming and cfr, could you guys check the code, I'm expecting to get a different result.

The game is RPS where the opponent has an imbalanced strategy. He is throwing scissors 40%.

What should happen: The program learns to throw Rock 100%
What I expect to happen: The regrets will build up much faster for rock than for the other two, so over a large sample the other two options will tend towards zero but will still stay non-zero
What is actually happening: The result being outputted is {.3,.3,.3}, which is the NE but not what should be happening.

Code:
import java.util.Arrays;
import java.util.Random;

public class RPSTrainer
{
    // set values to each move and for total number of actions
    // the actual numbers assigned to these moves don't matter in this case since game is symettric
    public static final int ROCK = 0, PAPER = 1, SCISSORS = 2, NUM_ACTIONS = 3;
    // initiate randomizer
    public static final Random random = new Random();
    // create array to store regrets
    double[] regretSum = new double[NUM_ACTIONS],
    // create array to store temporary strategies
    strategy = new double[NUM_ACTIONS],
    // create array to store total average strategy
    strategySum = new double[NUM_ACTIONS],
    // define opponent's strategy for the non gto game
    oppStrategy = { 0.4, 0.3, 0.3 };
    
    // Main method to run the whole training session
    public static void main(String args[]) {
        RPSTrainer trainer = new RPSTrainer();
        trainer.train(1000000);
        System.out.println(Arrays.toString(trainer.getAverageStrategy()));
    }
    
    
    // returns an array with a defined strategy based on the number of regrets during training
    // for each action divides that action's regrets by the total number of regrets to determine frequency
    // if there are no regrets each action is given an equal liklihood
    private double[] getStrategy() {
        double normalizingSum = 0;
        for (int a = 0; a < NUM_ACTIONS; a++) {
            strategy[a] = regretSum[a] > 0 ? regretSum[a] : 0;
            normalizingSum += strategy[a];
        }
        for (int a = 0; a < NUM_ACTIONS; a++) {
            if (normalizingSum > 0) {
                strategy[a] /= normalizingSum;
            }
            else {
                strategy[a] = 1.0 / NUM_ACTIONS;
                strategySum[a] += strategy[a];
            }
        }
        return strategy;
    }
    
    // returns a number to represent the action chosen by Hero, as defined at the top of the code
    // randomizes a number, then runs through each option, stopping when the cumalitve prob
    // is greater than the random number
    public int getAction(double[] strategy) {
        double r = random.nextDouble();
        int a = 0;
        double cumulativeProbability = 0;
        while (a < NUM_ACTIONS - 1) {
            cumulativeProbability += strategy[a];
            if (r < cumulativeProbability) {
                break;
            }
            a++;
        }
        return a;
    }
    
    // runs a given number of iterations of training to generate regrets
    // after each iteration adds the regrets for that action to the sum of regrets
    public void train(int iterations) {
        double[] actionUtility = new double[NUM_ACTIONS];
        for (int i = 0; i < iterations; i++) {
            //Get regret-matched mixed-strategy actions
            double[] strategy = getStrategy();
            int myAction = getAction(strategy);
            int otherAction = getAction(oppStrategy);
            //Compute action utilities
            actionUtility[otherAction] = 0;
            actionUtility[otherAction == NUM_ACTIONS - 1 ? 0 : otherAction + 1] = 1;
            actionUtility[otherAction == 0 ? NUM_ACTIONS - 1 : otherAction - 1] = -1;
            //Accumulate action regrets
            for (int a = 0; a < NUM_ACTIONS; a++)
            {
                regretSum[a] += actionUtility[a] - actionUtility[myAction];
                
            }
        }
    }
    
    // returns the average strategy over all the iterations so far
    // normailzes the total number of regrets for all actions
    public double[] getAverageStrategy() {
        double[] avgStrategy = new double[NUM_ACTIONS];
        double normalizingSum = 0;
        for (int a = 0; a < NUM_ACTIONS; a++)
            normalizingSum += strategySum[a];
        for (int a = 0; a < NUM_ACTIONS; a++)
        if (normalizingSum > 0)
            avgStrategy[a] = strategySum[a] / normalizingSum;
        else {
            avgStrategy[a] = 1.0 / NUM_ACTIONS;
        }
        return avgStrategy;
    }
    
}
Can you figure out the mistake?

I think the error is in getStrategy:

Code:
private double[] getStrategy() {
        double normalizingSum = 0;
        for (int a = 0; a < NUM_ACTIONS; a++) {
            strategy[a] = regretSum[a] > 0 ? regretSum[a] : 0;
            normalizingSum += strategy[a];
        }
        for (int a = 0; a < NUM_ACTIONS; a++) {
            if (normalizingSum > 0) {
                strategy[a] /= normalizingSum;
            }
            else {
                strategy[a] = 1.0 / NUM_ACTIONS;
  //              strategySum[a] += strategy[a];
            }
          strategySum[a] += strategy[a];   
        }
        return strategy;
    }
nickthegeek is offline   Reply With Quote
Old 05-16-2014, 10:42 AM   #4
ibavly
2010 PSOP Champion
 
ibavly's Avatar
 
Join Date: Feb 2010
Location: Check out my gang listing
Posts: 22,977
Re: CFR RPS code

That worked thanks!

As expected we didn't get a pure strategy, but really close with 99.99% paper
ibavly is offline   Reply With Quote
Old 05-17-2014, 01:27 PM   #5
ibavly
2010 PSOP Champion
 
ibavly's Avatar
 
Join Date: Feb 2010
Location: Check out my gang listing
Posts: 22,977
Re: CFR RPS code

Now I tried to make the RPS work where both players are using regret matching.

I got the right result, but since I'm such a noob I was just guessing on a bunch of stuff, so it's possible I got the right result despite my mistakes. I'm not sure if the way I changed getAverageStrategy() to include an inputted array is correct, and also don't know exactly what I did to make the println work. How does trainer.getAverageStrategy(trainer.strategySum) reference the definition properly? I don't understand why it doesn't work like getAction(strategy) which is much cleaner.

I'm also not sure if there are spots I could simplify the code. Specifically in the getStrategy function, I chose to make another getoppStrategy function because I needed to update the regrets and strategySums, and couldn't think of any good way of differentiating them within the function.

Code:
import java.util.Arrays;
import java.util.Random;

public class RPSTrainer
{
    // set values to each move and for total number of actions
    // the actual numbers assigned to these moves don't matter in this case since game is symettric
    public static final int ROCK = 0, PAPER = 1, SCISSORS = 2, NUM_ACTIONS = 3;
    // initiate randomizer
    public static final Random random = new Random();
    // create array to store regrets
    double[] regretSum = new double[NUM_ACTIONS],
    // create array to store opponent's regrets
    oppRegretSum = new double[NUM_ACTIONS],
    // create array to store temporary strategies
    strategy = new double[NUM_ACTIONS],
    // create array to store total average strategy
    strategySum = new double[NUM_ACTIONS],
    // create array to store temporary opponent strategies
    oppStrategy = new double[NUM_ACTIONS],
    //create array to store total average opponent strategy
    oppStrategySum = new double[NUM_ACTIONS];
    
    // Main method to run the whole training session
    public static void main(String args[]) {
        RPSTrainer trainer = new RPSTrainer();
        trainer.train(1000000);
        System.out.println("Hero Strat:");
        System.out.println(Arrays.toString(trainer.getAverageStrategy(trainer.strategySum)));
        System.out.println("Villain Strat");
        System.out.println(Arrays.toString(trainer.getAverageStrategy(trainer.oppStrategySum)));
    }
    
    
    // returns an array with a defined strategy based on the number of regrets during training
    // for each action divides that action's regrets by the total number of regrets to determine frequency
    // if there are no regrets each action is given an equal liklihood
    private double[] getStrategy() {
        double normalizingSum = 0;
        for (int a = 0; a < NUM_ACTIONS; a++) {
            strategy[a] = regretSum[a] > 0 ? regretSum[a] : 0;
            normalizingSum += strategy[a];
        }
        for (int a = 0; a < NUM_ACTIONS; a++) {
            if (normalizingSum > 0)
                strategy[a] /= normalizingSum;
            else
                strategy[a] = 1.0 / NUM_ACTIONS;
            strategySum[a] += strategy[a];
        }
        return strategy;
    }
    // returns an array with a defined strategy based on the number of regrets during training
    // for each action divides that action's regrets by the total number of regrets to determine frequency
    // if there are no regrets each action is given an equal liklihood
    private double[] getOppStrategy() {
        double normalizingSum = 0;
        for (int a = 0; a < NUM_ACTIONS; a++) {
            oppStrategy[a] = oppRegretSum[a] > 0 ? oppRegretSum[a] : 0;
            normalizingSum += oppStrategy[a];
        }
        for (int a = 0; a < NUM_ACTIONS; a++) {
            if (normalizingSum > 0)
                oppStrategy[a] /= normalizingSum;
            else
                oppStrategy[a] = 1.0 / NUM_ACTIONS;
            oppStrategySum[a] += oppStrategy[a];
        }
        return oppStrategy;
    }
    // returns a number to represent the action chosen by Hero, as defined at the top of the code
    // randomizes a number, then runs through each option, stopping when the cumalitve prob
    // is greater than the random number
    public int getAction(double[] strategy) {
        double r = random.nextDouble();
        int a = 0;
        double cumulativeProbability = 0;
        while (a < NUM_ACTIONS - 1) {
            cumulativeProbability += strategy[a];
            if (r < cumulativeProbability) {
                break;
            }
            a++;
        }
        return a;
    }
    
    // runs a given number of iterations of training to generate regrets
    // after each iteration adds the regrets for that action to the sum of regrets
    public void train(int iterations) {
        double[] actionUtility = new double[NUM_ACTIONS];
        for (int i = 0; i < iterations; i++) {
            //Get regret-matched mixed-strategy actions
            double[] strat = getStrategy();
            double[] oppStrat = getOppStrategy();
            int myAction = getAction(strat);
            int otherAction = getAction(oppStrat);
            //Compute action utilities for Hero
            actionUtility[otherAction] = 0;
            actionUtility[otherAction == NUM_ACTIONS - 1 ? 0 : otherAction + 1] = 1;
            actionUtility[otherAction == 0 ? NUM_ACTIONS - 1 : otherAction - 1] = -1;
            //Accumulate action regrets for Hero
            for (int a = 0; a < NUM_ACTIONS; a++)
            {
                regretSum[a] += actionUtility[a] - actionUtility[myAction];
                
            }
            //Compute action utilities for Villain
            actionUtility[myAction] = 0;
            actionUtility[myAction == NUM_ACTIONS - 1 ? 0 : myAction + 1] = 1;
            actionUtility[myAction == 0 ? NUM_ACTIONS - 1 : myAction - 1] = -1;
            //Accumulate action regrets for Villain
            for (int a = 0; a < NUM_ACTIONS; a++)
            {
                oppRegretSum[a] += actionUtility[a] - actionUtility[otherAction];
                
            }
        }
    }
    
    // returns the average strategy over all the iterations so far
    // normailzes the total number of regrets for all actions
    public double[] getAverageStrategy(double[] strategySum) {
        double[] avgStrategy = new double[NUM_ACTIONS];
        double normalizingSum = 0;
        for (int a = 0; a < NUM_ACTIONS; a++)
            normalizingSum += strategySum[a];
        for (int a = 0; a < NUM_ACTIONS; a++)
        if (normalizingSum > 0)
            avgStrategy[a] = strategySum[a] / normalizingSum;
        else {
            avgStrategy[a] = 1.0 / NUM_ACTIONS;
        }
        return avgStrategy;
    }
    
}
ibavly is offline   Reply With Quote
Old 05-17-2014, 05:10 PM   #6
Allen C
journeyman
 
Join Date: Oct 2004
Posts: 379
Re: CFR RPS code

I don't know java but I'm pretty sure this stuff is universal so bear with me.

Quote:
I'm not sure if the way I changed getAverageStrategy() to include an inputted array is correct
It works correctly but here are a few thoughts. Since your strategySum is also a class level variable you could just access it directly without having to enter it as a parameter. You use the constant NUM_ACTIONS in the method so it wont work anyway unless your argument is an array of length NUM_ACTIONS. If you wanted the method to work on any array, you need to find out it's length using an arraylength method or as an additional parameter. GetAverageStrategy seems like a bad name, it should just be called normalizeStrategy as that's all its doing. The average strategy in CFR is something specific so that's confusing. I'm pretty sure you need curly braces around that for loop. Also, changing it to two for loops inside the if else statement would be free efficiency (more important in getStrategy, as that's inside the big loop).

Quote:
How does trainer.getAverageStrategy(trainer.strategySum) reference the definition properly? I don't understand why it doesn't work like getAction(strategy) which is much cleaner.
It does work lke getAction if you're talking about the parameters and class variables. In both you're inputting an array as an argument, using it's contents to compute something, and returning the result. If you're referring to the fact that you use trainer.getAverageStrategy but just getAction on its own, it seems to be because you've made an instance of your class in the first case, and you are calling the instance's method. In the second case you're calling the method from within the class's definition, and I guess in Java you don't have to reference the class again to do that.

In my humble opinion I would suggest doing away with the class/global variables completely as they just make it harder to understand what's going on and generally don't help matters much. In getStrategy(strategy) for instance they are just plain conflicting. Create them in the method they are used and pass them around as needed.

For instance, getStrategy and getOpponentStrategy are exactly the same. It should just take a parameter of a regret array and you can enter the opponent's or the hero's as necessary, just like you did in getAction.
Allen C is offline   Reply With Quote
Old 05-17-2014, 07:45 PM   #7
jh1711
No freakin title????????
 
jh1711's Avatar
 
Join Date: May 2012
Location: There is no box
Posts: 3,744
Re: CFR RPS code

Quote:
Originally Posted by ibavly View Post
I'm not sure if the way I changed getAverageStrategy() to include an inputted array is correct, and also don't know exactly what I did to make the println work. How does trainer.getAverageStrategy(trainer.strategySum) reference the definition properly? I don't understand why it doesn't work like getAction(strategy) which is much cleaner.
main is a static function of the class RPSTrainer; strategySum is a variable of the object trainer. If you'd introduce a new method, you could reference the variables more intuitively. E.g. this should work, when you call trainer.print() after trainer.train() in main (untested):

Code:
public void print() {
  System.out.println("Hero Strat:");
  System.out.println(Arrays.toString(getAverageStrategy(strategySum)));
  System.out.println("Villain Strat");
  System.out.println(Arrays.toString(getAverageStrategy(oppStrategySum)));
}
Quote:
Originally Posted by ibavly View Post
I'm also not sure if there are spots I could simplify the code. Specifically in the getStrategy function, I chose to make another getoppStrategy function because I needed to update the regrets and strategySums, and couldn't think of any good way of differentiating them within the function.
Your functions modify variables outside their scope and should be named accordingly, or stop doing that. You can choose between more procedural or more OO.

I hope you don't mind an additional remark: Your code would be more readable if you followed the always type, always block principle.
jh1711 is offline   Reply With Quote
Old 06-23-2017, 10:57 AM   #8
CaptnCanary
newbie
 
Join Date: Jan 2007
Location: England
Posts: 17
Re: CFR RPS code

I know this is old, but I'm working on the same problem and came across this. I took a look at your code and I think the only reason that you get the right answer is a coincidence. The strategies initialise as 0,0,0 and the getStrategy method changes them to 0.33, 0.33, 0.33 as a starting point. It only runs one iteration so it just stops there. I think you need to initialise each competitor with a specific opponent strategy and run iterations from that point. I've got it running but it doesn't converge for me yet. I'm not sure how long it should take.

Last edited by CaptnCanary; 06-23-2017 at 11:20 AM.
CaptnCanary is offline   Reply With Quote
Old 06-30-2017, 05:39 PM   #9
mediacalc
centurion
 
Join Date: Jul 2016
Posts: 109
Re: CFR RPS code

Quote:
Originally Posted by CaptnCanary View Post
I know this is old, but I'm working on the same problem and came across this. I took a look at your code and I think the only reason that you get the right answer is a coincidence. The strategies initialise as 0,0,0 and the getStrategy method changes them to 0.33, 0.33, 0.33 as a starting point. It only runs one iteration so it just stops there. I think you need to initialise each competitor with a specific opponent strategy and run iterations from that point. I've got it running but it doesn't converge for me yet. I'm not sure how long it should take.
It's safe to say that OP has either fixed the problem or moved on from it entirely in the 3 years since he made this thread
mediacalc is offline   Reply With Quote
Old 06-30-2017, 09:55 PM   #10
ibavly
2010 PSOP Champion
 
ibavly's Avatar
 
Join Date: Feb 2010
Location: Check out my gang listing
Posts: 22,977
Re: CFR RPS code

Quote:
Originally Posted by CaptnCanary View Post
I know this is old, but I'm working on the same problem and came across this. I took a look at your code and I think the only reason that you get the right answer is a coincidence. The strategies initialise as 0,0,0 and the getStrategy method changes them to 0.33, 0.33, 0.33 as a starting point. It only runs one iteration so it just stops there. I think you need to initialise each competitor with a specific opponent strategy and run iterations from that point. I've got it running but it doesn't converge for me yet. I'm not sure how long it should take.


That did the trick, thanks!
ibavly is offline   Reply With Quote
Old 10-31-2017, 07:18 PM   #11
leavesofliberty
self-banned
 
Join Date: Jul 2010
Location: probably busto
Posts: 6,146
Re: CFR RPS code

Quote:
Originally Posted by Craggoo View Post
Is this the year where everyone switches from "poker pro" to "computer programmer"?
Yes, yes it is. And the code is all in a pdf.

http://modelai.gettysburg.edu/2013/cfr/cfr.pdf

in b4 lol JAVA. Because, you know, everyone in groundbreaking research is like, omg lets use JAVA.
leavesofliberty is offline   Reply With Quote

Reply
      

Thread Tools
Display Modes

Posting Rules
You may not post new threads
You may not post replies
You may not post attachments
You may not edit your posts

BB code is On
Smilies are On
[IMG] code is On
HTML code is Off


Forum Jump


All times are GMT -4. The time now is 04:34 PM.


Powered by vBulletin®
Copyright ©2000 - 2017, Jelsoft Enterprises Ltd.
Copyright © 2008-2017, Two Plus Two Interactive
 
 
Poker Players - Streaming Live Online