Who wants to help build the world's strongest backgammon bot? - Backgammon Forum - Discuss Backgammon Strategy, Clubs and Books

Two Plus Two Forums Sports and Games Chess and Other Board Games Backgammon Forum hosted by Bill Robertie.

Who wants to help build the world's strongest backgammon bot?

Post Reply Subscribe

...

01-25-2021 , 03:42 PM

Tuee

centurion

Join Date: Jul 2006 Posts: 164

tl;dr:

I would like to start an open-source software project with the aim of writing the world's strongest AND most informative backgammon engine. Looking to see if anyone's interested in discussing algorithmic design decisions and/or co-collaboration.

Long/technical version:

Based on the current state-of-the-art in AI and Reinforcement Learning in 2021, I believe beating the existing bots is a very realistic and attainable goal for a small team of part-time software developers with up-to-date AI skills. No original research or high-level backgammon skills required.

Here's me irl https://www.linkedin.com/in/matthew-nowak/ I am a financial and technology professional, and I use AI every day to solve real-world problems in the finance industry. Algorithmic trading is my day job, but backgammon (and poker!) are passions. (Even though I am an intermediate BG player... and even worse at poker

)

My hope for this thread is to drum up interest, questions, and eventually co-collaborators. Then we can take this to github and turn it into a real open source project.

To kick that off, I want to outline the main algorithmic design decisions which I believe will surpass the current open- and closed- source offerings of BG engines. I would certainly love some constructive debate too, but please keep it focused on algorithmic design decisions. If you are still in denial that top bots are demonstrably better at backgammon than the best humans-- well this thread isn't for you, since we're only trying to make that gap wider

Based on recent AI research, and especially the alphago project by Google Deepmind (https://deepmind.com/research/case-s...e-story-so-far) I see the following as the biggest "game changers":

1) replace deterministic heuristic search (n-fold ply) with stochastic heuristic search (Monte Carlo Tree Search, or MCTS). This worked unbelievably well with AlphaZero, and I think would work just as well with Backgammon for the same reasons. (The dice doesn't really change the problem that much-- just a tweak the probability weights you use to backpropagate valuations through the game tree.)

Note that this change eliminates the traditional dichotomy between n-ply and full rollout evaluations: the two turn into a single, highly flexible, and self-improving method. A "deeper analysis" for a tricky position simply involves letting the stochastic solver run longer, gradually expanding the game tree in a way which balances exploration (searching for potentially better lines) and exploitation (extending the "best lines" to get a more accurate valuation).

The difference between deterministic and stochastic search is subtle but unbelievably powerful.

The general problem with traditional rollouts is that the evaluation function (usually some kind of Artificial Neural Network, or ANN) is still used to make individual moves-- meaning the quality of the moves the rollout algorithm makes does NOT improve with the number of rollout simulations. This makes it hard for a rollout to materially improve the analysis of a position which the evaluation is already weak at playing.

So for example, if your evaluation function is already weak at playing back games, playing many backgames badly (which is what rollouts do) doesn't improve the evaluation of the optimal agent, but rather the suboptimal one. This can be partially alleviated through making each rollout action itself a (truncated) rollout problem, however the computational cost of doing this explodes exponentially.

Other game engines have solved this "curse of dimensionality" problem using ad hoc methods, such as separate logic for certain board positions (like back games) and/or hand-rolled evaluation functions (which is what Deep Blue did in the 90s with chess to beat Kasparov).

Not only are ad-hoc methods extremely time consuming to write and heavily dependent on human intelligence (the Deep Blue team was full of chess domain experts), but we can now say thanks to AlphaGo that computers can find better solutions "tabula rasa" anyway!

2) Use a single Deep Neural Network with "two heads" for both policy and evaluations.

This is another innovation from Deepmind which, working in conjunction with MCTS, allows improvements to the current learned state to be made via self-play.

Using a modern deep learning framework like Tensorflow or Torch will make it much easier to design, train, and compare different ANNs to find the "best" one. No one needs to hand-roll ANN code in 2021.

3) Use Eligibility Tracing to provide more "intuitive" explanations for why correct play is correct.

Even the best bots are terrible at providing an explanation or rationale understandable by humans. While this is unlikely to be an area where machines beat humans for the foreseeable future, I think it's possible to provide a bit more colour commentary than the kind of equity analysis that we're used to seeing. This is in some sense an optional feature, as it's not strictly required just to make a strong bot. However it is potentially one of the more useful features to us humans, as it would be useful to amateurs, professionals, and educators/authors alike.

Would definitely love questions or suggestions regarding the above design decisions!!

Quote

01-26-2021 , 12:19 PM

NoChinDeluxe

enthusiast

Join Date: Jan 2011 Posts: 78

I guess the only question I have is, why? If the best bots already beat the best humans, what are you really trying to accomplish? There are also theory books that explain why the bots play the way they do, so I'm not sure what the goal is here.

Quote

01-26-2021 , 06:48 PM

punter11235

Carpal \'Tunnel

Join Date: Mar 2005 Posts: 8,210

Quote:

1)Beating humans at games is an incredibly low bar, interesting stuff starts above it.
2)Maybe stronger AI allows us to discover new things about the game. Just because there are book explaining how current top AI plays doesn't mean it's the correct way

Personally I don't know enough about NNs to help. It does look like it should be possible to beat current top programs though and I am very curious to see if there is still something fundamental to discover about bg strategy.

Quote

01-27-2021 , 10:57 AM

Tuee

centurion

Join Date: Jul 2006 Posts: 164

First of all, as an aside-- this would be the next chapter in an important piece of AI history. I suspect the backgammon community is not fully aware of what a landmark AI achievement Gerald Tesasuro's TD Gammon was for way more than just Backgammon. It proved in 1992 that neural networks had the potential to surpass humans in all kinds of domains-- even though we didn't have the compute power back then to run the kinds of massive models we can run today.

Backgammon and AI have an important history together, more so than with any other game, including chess. (Deep Blue did NOT learn through self-play like TD Gammon did.)

A more few reasons:
1) The best open-source bot (GNU) is way behind the curve relative to the commercial offerings, and also written in pure C which is not the most accessible language in 2021. This project is first and foremost a gift to the backgammon community.
2) I hypothesize that eligibility traces can be used to help the bot provide a lot more human-understandable insight into "correct play" than the standard equity analysis we're used to. Books are fantastic but obviously they can't cover every blunder you're still confused by.
3) For me and people like me, building tools with AI is hellua fun. Obviously having contributed to the worlds-best-anything is a satisfying achievement! (And great for linkedin/resume.)

Also full transparency: i'm also potentially interested in writing a HUNL Hold'Em bot and will probably go with whatever project garners more interest -- they could even use the same underlying engine!

Last edited by Tuee; 01-27-2021 at 11:08 AM. Reason: merge with other message

Quote

01-28-2021 , 06:12 PM

bgblitz

enthusiast

Join Date: Mar 2020 Posts: 74

Quote:

Originally Posted by Tuee

tl;dr:
Based on the current state-of-the-art in AI and Reinforcement Learning in 2021, I believe beating the existing bots is a very realistic and attainable goal for a small team of part-time software developers with up-to-date AI skills. No original research or high-level backgammon skills required.

I'm afraid you underestimate how difficult it is to come from PR 5 to PR 1. And the effort needed for the user interface...

Quote:

Originally Posted by Tuee

1) replace deterministic heuristic search (n-fold ply) with stochastic heuristic search (Monte Carlo Tree Search, or MCTS). This worked unbelievably well with AlphaZero, and I think would work just as well with Backgammon for the same reasons.

You do MCTS usually when you have no good evaluation function, a scenario that isn't true in BG. Before AlphaZero chess programmers would have sold her mother for a TD-Gammon quality evaluation function (o.k. maybe only her mother in law

)

Quote:

Originally Posted by Tuee

Note that this change eliminates the traditional dichotomy between n-ply and full rollout evaluations: the two turn into a single, highly flexible, and self-improving method. A "deeper analysis" for a tricky position simply involves letting the stochastic solver run longer, gradually expanding the game tree in a way which balances exploration (searching for potentially better lines) and exploitation (extending the "best lines" to get a more accurate valuation).

I assume that the combinatoric explosion makes a decision somewhere necessary and if the bot misunderstands a position that might influence it here too.
BTW I made, for other reasons, some experiments with an similar algorithm that I feel better suited to BG and the deterministic stuff was better. But I wouldn't bet on that.

Quote:

Originally Posted by Tuee

The difference between deterministic and stochastic search is subtle but unbelievably powerful.

The general problem with traditional rollouts is that the evaluation function (usually some kind of Artificial Neural Network, or ANN) is still used to make individual moves-- meaning the quality of the moves the rollout algorithm makes does NOT improve with the number of rollout simulations. This makes it hard for a rollout to materially improve the analysis of a position which the evaluation is already weak at playing.

see above. The combinatoric implosion + random will introduce much noise.
When you want to be better than a top bot small differences matter.

Quote:

Originally Posted by Tuee

So for example, if your evaluation function is already weak at playing back games, playing many backgames badly (which is what rollouts do) doesn't improve the evaluation of the optimal agent, but rather the suboptimal one. This can be partially alleviated through making each rollout action itself a (truncated) rollout problem, however the computational cost of doing this explodes exponentially.

As I said, you have to evaluate to select the moves and then your blind spots will influence the outcome. On the computer Olympiad 2007 in Amsterdam was a BG program with a weak evaluation function and pure MCTS. Weak was a very friendly judgement of the playing strength.

Quote:

Originally Posted by Tuee

2) Use a single Deep Neural Network with "two heads" for both policy and evaluations.

Which function should the policy and evaluations nets have?

BTW related to Deep Neural Nets. How is the dimensionality of Go or chess and how of BG? Will algorithms that are a break through for images i.e. to 2D-Data help on an essentially 1-D game?

Quote:

Originally Posted by Tuee

Using a modern deep learning framework like Tensorflow or Torch will make it much easier to design, train, and compare different ANNs to find the "best" one. No one needs to hand-roll ANN code in 2021.

There is somewhere a source where someone has exactly done this (to lazy to google right now). With, IIRC Tensorflow, on a decent GPU he trains 1000 games the hour.
With the "slow" Java language on a single core, I need for that 1000 games, depending on the CPU between 30 to 45 seconds.
BLAS Level 2 Algorithms are not so well suited to a GPU.

Quote:

Originally Posted by Tuee

3) Use Eligibility Tracing to provide more "intuitive" explanations for why correct play is correct.

Could you provide some links? That is something I'm really interested in. My last activities are 2 or3 years ago, what I found was explanation of rule based system or the weights in the image recognition. Nothing useful with simple MLP

You may also reach my by mail iy you want a faster discussion...
And don't get me wrong. I appreciate your enthusiasm, but I want to play the advocates diaboli

Quote

01-28-2021 , 10:50 PM

Tuee

centurion

Join Date: Jul 2006 Posts: 164

I definitely appreciate the advocates diaboli

Totally understood in that spirit. Thanks for the feedback!

You're right about the UI being a lot of work. I was going to worry about that later and just focus on developing the engine, but there are probably enough open-source backgammon UIs floating around that we don't have to reinvent the wheel.

I think most of your questions are already addressed by the AlphaGo paper. They use MCTS to train an ANN from scratch via self-play, using nearly identical techniques for Go, Soghi, and Chess-- all of which have larger state spaces than Backgammon. Adding BG into the mix is relatively straightforward. And of course, we know training an ANN via self-play already works for backgammon!

To be clear, Go was a much much tougher nut to crack than Backgammon ever was. The state space of a 19x19 board is many orders of magnitude larger. I wouldn't expect world-class Go play to be possible on an ordinary PC. Deepmind used a massive GPU & TPU cluster costing thousands of dollars/hr to run. We already know this is not required for world-class Backgammon.

Note that the dice in Backgammon don't increase the number of legal board positions, they only determine what moves (i.e. state transitions) are permitted. As such they don't add to the complexity of the evaluation or policy function. It DOES increase the number of self-play games you need to play before you can be sure you've actually made an improvement, but that's just law of large numbers.

I have also wondered if convolutional neural networks will be as useful for BG, given that the game is less "spatial" than Go and Chess. There is only one axis of motion (forward and back), so at most 1d convolutions would be involved. I *think* most other BG engines are using fully-connected networks, but of course we can try a whole bunch of different filters and see which ones work best. This is where a modern deep learning framework makes it easy to try out many different NN architectures quickly, so it will be straight forward to try out different designs. Similarly, the way the game is represented as an input to the NN is also important, but that can also be experimented with!

For reading about eligibility traces, I would highly recommend Reinforcement Learning by Sutton & Barto. In fact, I'd say the whole book is essential reading for anyone looking to do serious game work. The 2nd edition came out in 2018 and is updated with some AlphaZero stuff, so make sure you get that one. They talk about TD Gammon too (though not in the context of eligibility traces). Essentially my question is: if ET are normally used to ascribe credit for a particular outcome, can they also be used to make meaningful connections between earlier actions and later board states? If so, the evaluation function could tell you a lot more than just equity; it could tell you what theme you should be playing, the probability you end up in a back game, or succeed at a blitz, or how many blots is too many, exactly how much volatility a position has, etc.. The "final outcome" doesn't have to be the only thing we're trying to predict, in principle you can also make predictions about future game states--which a human may find useful for understanding why a particular play is best.

Quote

01-30-2021 , 06:56 PM

bgblitz

enthusiast

Join Date: Mar 2020 Posts: 74

Quote:

Originally Posted by Tuee

You're right about the UI being a lot of work. I was going to worry about that later and just focus on developing the engine, but there are probably enough open-source backgammon UIs floating around that we don't have to reinvent the wheel.

AFAIK only GnuBG is available having a UI with a lot of functions. But it is written in C and in a style I'm not comfortable with. At least when I wanted to look at some stuff where there was little information available about the algorithm and math (e.g. how to calculate cubeful equities in match play.) It felt it was always difficult to understand.

Something like UCI or XBoard unfortunately doesn't exist for Backgammon (anymore. There was a SW that once supported Snowie, GnuBG, BGBlitz, not sure about Jellyfish).
I developped BGBlitz at the start to host several UIs like UCI but nobody wanted to use it (it is astonishing that so few people developing a BG AI. Not the slightest idea why) so I dropped it after some years to simplify my code. But I could be easily talked into doing such a thing again. I use Java and I think that this might be to limiting, so maybe JSON via a local socket connection might be easier for python/C/Rust/Go folks

Quote:

Originally Posted by Tuee

I think most of your questions are already addressed by the AlphaGo paper. They use MCTS to train an ANN from scratch via self-play, using nearly identical techniques for Go, Soghi, and Chess-- all of which have larger state spaces than Backgammon. Adding BG into the mix is relatively straightforward. And of course, we know training an ANN via self-play already works for backgammon!

I'm aware of this

but I think that due to the excellent evaluation quality you won't need MCTS. I think there is something better suited to BG but I wont discuss this in the public

In fact it is mentioned in the Sutton book

Quote:

Originally Posted by Tuee

To be clear, Go was a much much tougher nut to crack than Backgammon ever was. The state space of a 19x19 board is many orders of magnitude larger. I wouldn't expect world-class Go play to be possible on an ordinary PC. Deepmind used a massive GPU & TPU cluster costing thousands of dollars/hr to run. We already know this is not required for world-class Backgammon.

I watch GO just from the corner of my eyes, but IIRC the other programmers copied the approach successfully. You probably remember that in chess it was a domain of the super computers for some time until the micro computers take over

Quote:

Originally Posted by Tuee

Note that the dice in Backgammon don't increase the number of legal board positions, they only determine what moves (i.e. state transitions) are permitted. As such they don't add to the complexity of the evaluation or policy function. It DOES increase the number of self-play games you need to play before you can be sure you've actually made an improvement, but that's just law of large numbers.

The state space is IMHO here less important than the search tree size.

Quote:

Originally Posted by Tuee

I have also wondered if convolutional neural networks will be as useful for BG, given that the game is less "spatial" than Go and Chess. There is only one axis of motion (forward and back), so at most 1d convolutions would be involved. I *think* most other BG engines are using fully-connected networks, but of course we can try a whole bunch of different filters and see which ones work best. This is where a modern deep learning framework makes it easy to try out many different NN architectures quickly, so it will be straight forward to try out different designs. Similarly, the way the game is represented as an input to the NN is also important, but that can also be experimented with!

I'm quite confident that convolutional nets wont work better than MLP, but and here you have an excellent point: it is quite easy to try out a lot of ideas with much less effort.

Quote:

Originally Posted by Tuee

For reading about eligibility traces, I would highly recommend Reinforcement Learning by Sutton & Barto. In fact, I'd say the whole book is essential reading for anyone looking to do serious game work. The 2nd edition came out in 2018 and is updated with some AlphaZero stuff, so make sure you get that one. They talk about TD Gammon too (though not in the context of eligibility traces). Essentially my question is: if ET are normally used to ascribe credit for a particular outcome, can they also be used to make meaningful connections between earlier actions and later board states? If so, the evaluation function could tell you a lot more than just equity; it could tell you what theme you should be playing, the probability you end up in a back game, or succeed at a blitz, or how many blots is too many, exactly how much volatility a position has, etc.. The "final outcome" doesn't have to be the only thing we're trying to predict, in principle you can also make predictions about future game states--which a human may find useful for understanding why a particular play is best.

I learned a lot from them when the first issue came out and I wasn't aware of a 2nd issue. Just ordered

IIRC about 15(?) years ago Zare implemented a bot where he claimed that it could explain moves (IIRC), he claimed some other things (Bot is so advanced the the GnuBG programmer can't even copy it) but it never saw a public release, so it is quite difficult to judge this.
I have some ideas at that point too, but my code AI code is less flexible, the price I paid for efficiency and doing BGBlitz in my spare time it is difficult to find the time for experiments.

For things I don't want to discuss in the public you can reach me at
frank at BGBlitz dot com.

Quote

Post Reply Subscribe

...