tl;dr:
I would like to start an open-source software project with the aim of writing the world's strongest AND most informative backgammon engine. Looking to see if anyone's interested in discussing algorithmic design decisions and/or co-collaboration.
Long/technical version:
Based on the current state-of-the-art in AI and Reinforcement Learning in 2021, I believe beating the existing bots is a very realistic and attainable goal for a small team of part-time software developers with up-to-date AI skills. No original research or high-level backgammon skills required.
Here's me irl
https://www.linkedin.com/in/matthew-nowak/ I am a financial and technology professional, and I use AI every day to solve real-world problems in the finance industry. Algorithmic trading is my day job, but backgammon (and poker!) are passions. (Even though I am an intermediate BG player... and even worse at poker
)
My hope for this thread is to drum up interest, questions, and eventually co-collaborators. Then we can take this to github and turn it into a real open source project.
To kick that off, I want to outline the main algorithmic design decisions which I believe will surpass the current open- and closed- source offerings of BG engines. I would certainly love some constructive debate too, but please keep it focused on algorithmic design decisions. If you are still in denial that top bots are demonstrably better at backgammon than the best humans-- well this thread isn't for you, since we're only trying to make that gap wider
Based on recent AI research, and especially the alphago project by Google Deepmind (
https://deepmind.com/research/case-s...e-story-so-far) I see the following as the biggest "game changers":
1) replace deterministic heuristic search (n-fold ply) with stochastic heuristic search (Monte Carlo Tree Search, or MCTS). This worked unbelievably well with AlphaZero, and I think would work just as well with Backgammon for the same reasons. (The dice doesn't really change the problem that much-- just a tweak the probability weights you use to backpropagate valuations through the game tree.)
Note that this change eliminates the traditional dichotomy between n-ply and full rollout evaluations: the two turn into a single, highly flexible, and self-improving method. A "deeper analysis" for a tricky position simply involves letting the stochastic solver run longer, gradually expanding the game tree in a way which balances exploration (searching for potentially better lines) and exploitation (extending the "best lines" to get a more accurate valuation).
The difference between deterministic and stochastic search is subtle but unbelievably powerful.
The general problem with traditional rollouts is that the evaluation function (usually some kind of Artificial Neural Network, or ANN) is still used to make individual moves-- meaning the quality of the moves the rollout algorithm makes does NOT improve with the number of rollout simulations. This makes it hard for a rollout to materially improve the analysis of a position which the evaluation is already weak at playing.
So for example, if your evaluation function is already weak at playing back games, playing many backgames badly (which is what rollouts do) doesn't improve the evaluation of the optimal agent, but rather the suboptimal one. This can be partially alleviated through making each rollout action itself a (truncated) rollout problem, however the computational cost of doing this explodes exponentially.
Other game engines have solved this "curse of dimensionality" problem using ad hoc methods, such as separate logic for certain board positions (like back games) and/or hand-rolled evaluation functions (which is what Deep Blue did in the 90s with chess to beat Kasparov).
Not only are ad-hoc methods extremely time consuming to write and heavily dependent on human intelligence (the Deep Blue team was full of chess domain experts), but we can now say thanks to AlphaGo that computers can find better solutions "tabula rasa" anyway!
2) Use a single Deep Neural Network with "two heads" for both policy and evaluations.
This is another innovation from Deepmind which, working in conjunction with MCTS, allows improvements to the current learned state to be made via self-play.
Using a modern deep learning framework like Tensorflow or Torch will make it much easier to design, train, and compare different ANNs to find the "best" one. No one needs to hand-roll ANN code in 2021.
3) Use Eligibility Tracing to provide more "intuitive" explanations for why correct play is correct.
Even the best bots are terrible at providing an explanation or rationale understandable by humans. While this is unlikely to be an area where machines beat humans for the foreseeable future, I think it's possible to provide a bit more colour commentary than the kind of equity analysis that we're used to seeing. This is in some sense an optional feature, as it's not strictly required just to make a strong bot. However it is potentially one of the more useful features to us humans, as it would be useful to amateurs, professionals, and educators/authors alike.
Would definitely love questions or suggestions regarding the above design decisions!!