First AI to beat multiplayer Poker game vs top pros (like chris furgeson, darren elias...) - Page 4 - Poker News

Two Plus Two Forums Poker News & Discussion News, Views, and Gossip

First AI to beat multiplayer Poker game vs top pros (like chris furgeson, darren elias...)

Post Reply Subscribe

...

Page 4 of 12

1 2 3 4 5 6 7 8 9

Page 4 of 12

1 2 3 4 5 6 7 8 9

07-12-2019 , 06:51 PM

#76

goldFishshark

grinder

Join Date: May 2016 Posts: 402

So glad that these guys could potentially destroy thousands of online poker careers for the sake of curiosity and academia. Such heroes.

Quote

07-12-2019 , 07:06 PM

#77

TeelXp

See my Commercial Software thread

Join Date: Apr 2012 Posts: 625

This bot is nothing special, just combination of the solutions from "Simple Preflop Holdem" and "Simple Postflop" + good conditions for bot to play.

Quote

07-12-2019 , 07:13 PM

#78

LittleGoliath

Pooh-Bah

Join Date: Nov 2011 Posts: 3,921

Quote:

Originally Posted by TeelXp

This bot is nothing special, just combination of the solutions from "Simple Preflop Holdem" and "Simple Postflop" + good conditions for bot to play.

as far as i'm aware off, this is not true since the bot started from absolute scratch and just got better by playing against versions of itself. don't think there any preprogrammed ranges involved whatsoever tbh

Quote

07-12-2019 , 07:21 PM

#79

fishfood69er

old hand

Join Date: Feb 2014 Posts: 1,702

Quote:

Originally Posted by LittleGoliath

No, it has baseline strategy , it uses self play but its based off of strategy

Quote

07-12-2019 , 07:26 PM

#80

LittleGoliath

Pooh-Bah

Join Date: Nov 2011 Posts: 3,921

Quote:

Originally Posted by fishfood69er

No, it has baseline strategy , it uses self play but its based off of strategy

hmm, must have missed that part then. could you perhaps provide a source for that statement?

I thought the idea of the CFR algorithm was this (source: https://www.quora.com/What-is-an-int...minimization):

Quote:

CFR is a self-play algorithm: it learns to play a game by repeatedly playing against itself. The program starts off with a strategy that is uniformly random, where it will play every action at every decision point with an equal probability. It then simulates playing games against itself. After every game, it revisits its decisions, and finds ways to improve its strategy. It repeats this process for billions of games, improving its strategy each time. As it plays, it gets closer and closer towards an optimal strategy for the game: a strategy that can do no worse than tie against any opponent.

The way it improves over time is by summing the total amount of regret it has for each action at each decision point, where regret means: how much better would I have done over all the games so far if I had just always played this one action at this decision, instead of choosing whatever mixture over actions that my strategy said I should use? Positive regret means that we would have done better if we had taken that action more often. Negative regret means that we would have done better by not taking that action at all. After each game that the program plays against itself, it computes and adds in the new regret values for all of its decisions it just made. It then recomputes its strategy so that it takes actions with probabilities proportional to their positive regret. If an action would have been good in the past, then it will choose it more often in the future.

It repeats this process for billions of games. So you have this long sequence of strategies that it was using on each game. Counter-intuitively, that sequence of strategies does not necessarily converge to anything useful (although it sometimes does so in practice, now, with the new CFR+ algorithm we describe in the Science paper). However, in a two-player zero-sum game, if you compute the average strategy over those billions of strategies in the sequence, then that average strategy will converge towards a Nash equilibrium for the game. After it's finished learning how to play by playing against itself, it doesn't have to change any further: it just uses that average strategy against any human or computer opponent it faces.

Quote

07-12-2019 , 07:33 PM

#81

Badreg2017

old hand

Join Date: Aug 2017 Posts: 1,682

Quote:

Originally Posted by wheatrich

based on a seven hand sample size bot is a wild maniac and also never folds any piece to anything ever.

I remember that bot a lot from about 2005 and not so much after that.

the QT vs QJ shown hand is very cherry picking to me since there aren't many worse Q's that are gonna call, heck QT might have folded there.

It doesn’t matter if a human isn’t calling with many worse hands. A human likely should be calling with many worse hands. If the bot doesn’t get value with its QJ very often there it means the human is over-folding and is going to get owned by the bot’s bluffs (assuming the bot is in fact balancing correctly)

Quote

07-12-2019 , 07:37 PM

#82

Badreg2017

old hand

Join Date: Aug 2017 Posts: 1,682

Quote:

Originally Posted by tomsOn

I have an issue with this feat.

From the AMA with Noam Brown (one of the authors of the bot) when asked about the bots performance in multiway pots :

I believe performance in multiway pots should be the key thing to test in this challenge. Libratus already established that humans are no longer a match vs AI in heads up formats. While proving so again this time with varying pre-flop ranges is of course impressive the really interesting question is whether the AI can come up with a better post-flop strategy multiway than top humans. It is conceivable that what happened here is the bot murdered in the heads up pots, but did relatively poorly in multiway pots.

I also don't understand why the author says 'It's not really possible to measure the bot's performance just in specific situations'. Why is not possible to compare the results for multiway pots?

Let’s say the human players only entered multi-way pots with crazy strong ranges and were giving up EV by not defending their Bb enough multi-way. Then the bot wouldn’t do well in multi-way pots but would still be using a good multi-way strategy. I’m sure there are better and more complex examples, but that’s one example.

Quote

07-12-2019 , 07:46 PM

#83

TeelXp

See my Commercial Software thread

Join Date: Apr 2012 Posts: 625

Quote:

Originally Posted by LittleGoliath

In Simple Preflop Holdem preprogrammed ranges are also not involved. You just specify tree, stacks and start solving equilibrium from the scratch. The process of the solving could be called playing against itself - it is true. After each iteration the solution gets better, more accurate and less exploitable.

The name of the algorithm in the paper is the same as in the solver.

So, my statement is true.

Quote

07-12-2019 , 07:51 PM

#84

JackBurton

grinder

Join Date: Jul 2010 Posts: 588

Quote:

Originally Posted by goldFishshark

So glad that these guys could potentially destroy thousands of online poker careers for the sake of curiosity and academia. Such heroes.

Obviously the objective is to be applied to other fields, but one always should be cautious about AI applied to the real world. It's chaos out there, many times without clear predictable data. Remember LTCM and other guys, thinking that they can predict behaviour of market participants.

Curious also the first associations of this AI research is with the american military and with facebook, always good to see science put to good use.

Quote:

Game theory is marketed as a system you can apply to any sphere of life, but does it really have much to offer in terms of practical application? The distinguished game theorist, Ariel Rubinstein, suggests not.

Quote

07-12-2019 , 07:53 PM

#85

TeelXp

See my Commercial Software thread

Join Date: Apr 2012 Posts: 625

In the solver you could choose variations of Monte Carlo counterfactual regret minimization algorithm:

Quote

07-12-2019 , 08:16 PM

#86

LittleGoliath

Pooh-Bah

Join Date: Nov 2011 Posts: 3,921

Quote:

Originally Posted by TeelXp

cool, thanks for the information. however saying 'you are not impressed' because they use a well-known algorithm is quite denigratory. no need to reinvent the wheel either

Quote

07-12-2019 , 08:32 PM

#87

wheatrich

Carpal \'Tunnel

Join Date: Sep 2005 Posts: 21,239

Lots of players check better than QJ on the turn in those spots these days.

That AJ bluff probably worked some years ago but today wouldn't against most players.

Quote

07-12-2019 , 09:46 PM

#88

fishfood69er

old hand

Join Date: Feb 2014 Posts: 1,702

Quote:

Originally Posted by LittleGoliath

Quote:

Originally Posted by TeelXp

"Depth-limited search in imperfect-information games
The blueprint strategy for the entire game is necessarily coarse-grained owing to the size and complexity of no-limit Texas hold’em. Pluribus only plays according to this blueprint strategy in the first betting round (of four), where the number of decision points is small enough that the blueprint strategy can afford to not use information abstraction and have a lot of actions in the action abstraction. After the first round (and even in the first round if an opponent chooses a bet size that is sufficiently different from the sizes in the blueprint action abstraction) Pluribus instead conducts real-time search to determine a better, finer-grained strategy for the current situation it is in. For opponent bets on the first round that are slightly off the tree, Pluribus rounds the bet to a nearby on-tree size (using the pseudoharmonic mapping (39)) and proceeds to play according to the blueprint as if the opponent had used the latter bet size."

and it assigns playing styles.

"Pluribus instead uses a modified form of an approach that we recently designed—previously only for two-player zero-sum games (41)—in which the searcher explicitly considers that any or all players may shift to different strategies beyond the leaf nodes of a subgame. Specifically, rather than assuming all players play according to a single fixed strategy beyond the leaf nodes (which results in the leaf nodes having a single fixed value) we instead assume that each player may choose between k different strategies, specialized to each player, to play for the remainder of the game when a leaf node is reached. In the experiments in this paper, k=4. One of the four continuation strategies we use in the experiments is the precomputed blueprint strategy, another is a modified form of the blueprint strategy in which the strategy is biased toward folding, another is the blueprint strategy biased toward calling, and the final option is the blueprint strategy biased toward raising. This technique results in the searcher finding a strategy that is more balanced because choosing an unbalanced strategy (e.g., always playing Rock in Rock-Paper-Scissors) would be punished by an opponent shifting to one of the other continuation strategies (e.g., always playing Paper)."

so this isnt nash equilibrium ( this would change constantly ) . i think.

Last edited by fishfood69er; 07-12-2019 at 09:53 PM.

Quote

07-12-2019 , 10:12 PM

#89

indra135

newbie

Join Date: Dec 2018 Posts: 26

inb4 google makes some kind of alphazero bot and crushes everyone with it

Quote

07-12-2019 , 11:54 PM

#90

valuecutting

grinder

Join Date: May 2015 Posts: 419

It's 20 seconds per hand, not per action

It is trying to approximate nash, it doesn't assign playing styles

Quote

07-13-2019 , 01:55 AM

#91

Mtipster

newbie

Join Date: Sep 2016 Posts: 35

There will always be live poker.

Quote

07-13-2019 , 03:28 AM

#92

kaeru

adept

Join Date: Jan 2019 Posts: 719

Quote:

Originally Posted by wheatrich

Lots of players check better than QJ on the turn in those spots these days.

That AJ bluff probably worked some years ago but today wouldn't against most players.

I like you as a politics personality/poster but DONNY YOU ARE OUT OF YOUR ELEMENT in this particular spot

Quote

07-13-2019 , 05:37 AM

#93

mrno1324

veteran

Join Date: Feb 2016 Posts: 3,055

Quote:

Originally Posted by Mtipster

There will always be live poker.

deppressing as **** that you're inclined to point that out

Quote

07-13-2019 , 05:41 AM

#94

Stroggoz

journeyman

Join Date: Jun 2018 Posts: 227

It's fairly standard to overbet turn with QJ+ there for most regs in that spot, so i like the x/r. I don't think most regs are certain enough about other people's ranges to make the x/r this big though, but perhaps that kind of x/r size will become more and more common, i like it. Piosolver is going with mixed strategy there on that river, i think it mostly just leads as a huge overbet on river when it gets to this spot, but it does the big x/r as well.

So is there any access to more hands than just the 3? kinda unfair that these guys get to learn multiway strat from the bot while the rest of us don't.

Quote

07-13-2019 , 05:47 AM

#95

Big_Bad_Bill

journeyman

Join Date: Sep 2010 Posts: 361

Washington Post has an article about this:
https://www.washingtonpost.com/scien...=.a62550f00c44

An aspect of all this that I don't think has been mentioned yet ITT:

"One intriguing, or perhaps disturbing, element to the story is that the bot achieves these results without paying attention to the personalities, habits and strategies of its opponents. The bot does not concern itself with human psychology. It doesn’t know whom it is playing or try to calculate what the mental state of the opponent might be.

That’s in contrast to what’s happening this week in Las Vegas at the World Series of Poker. A television viewer will notice that the players spend a lot of time scrutinizing one another, trying to figure out who is bluffing and who is not — looking for the “tell.”

What Pluribus suggests is that humans may overrate the psychological part of the game. Getting the math and probabilities right seems to be all that’s necessary to be a champion.

It doesn’t matter who’s twitching and scratching and blinking at the table."

Quote

07-13-2019 , 05:56 AM

#96

mrno1324

veteran

Join Date: Feb 2016 Posts: 3,055

Quote:

Originally Posted by Big_Bad_Bill

bolded is surprising to no one other than maybe Double A and Charlie Carrel

Quote

07-13-2019 , 08:09 AM

#97

Ceres

veteran

Join Date: Mar 2019 Posts: 3,048

I would also like to know more about bot-detection, but surely poker sites are heavily incentivised to build AI detecting AI or they will inevitably lose their action too?

A large percentage of my strategy has been informed by machine-play (off-table). From my perspective 'they' have been and always will be better than me so the only question to worry about is whether online sites can maintain the integrity of the games, and I believe they can - or at least will likely do so - because they ultimately have more to lose than the players.

Quote

07-13-2019 , 08:11 AM

#98

mrno1324

veteran

Join Date: Feb 2016 Posts: 3,055

Quote:

Originally Posted by Ceres

Stars have made/will make an effort but I sincerely doubt anyone else will come up with something remotely effective to detect bots

Quote

07-13-2019 , 08:15 AM

#99

GoBackToGo

enthusiast

Join Date: Aug 2010 Posts: 82

Quote:

Originally Posted by TeelXp

In the solver you could choose variations of Monte Carlo counterfactual regret minimization algorithm:

thats like saying: both the pluribus and GTO solvers both run on PCs, therefor they are essentially the same and pluribus is nothing special, when in fact they use vastly different methods. while solvers try to find nash equlibria in 2 player situations, to my understanding it isn't even clear that there is such a thing as a nash equilibrium in multiway pots, unless obviously all players "agree" to find such a thing and cooperate in doing so. so no, pluribus is not just a "better" version of a solver.

Quote:

Originally Posted by fishfood69er

No, it has baseline strategy , it uses self play but its based off of strategy

"The core of Pluribus’s strategy was computed via self play, in which the AI plays against copies of itself, without any data of human or prior AI play used as input. The AI starts from scratch by playing randomly, and gradually improves as it determines which actions, and which probability distribution over those actions, lead to better outcomes against earlier versions of its strategy. (...)
Pluribus’s self play produces a strategy for the entire game offline, which we refer to as the blueprint strategy. Then during actual play against opponents, Pluribus improves upon the blueprint strategy by searching for a better strategy in real time for the situations it finds itself in during the game. In subsections below, we discuss both of those phases in detail, but first we discuss abstraction, forms of which are used in both phases to make them scalable."
https://science.sciencemag.org/conte...cience.aay2400

Quote

07-13-2019 , 08:28 AM

#100

Xenoblade

Pooh-Bah

Join Date: Sep 2017 Posts: 4,085

to me some of the plays it made indicate the accuracy of the solutions it came up with was prob fairly low so it was exploitable, just less than any human

Quote

Page 4 of 12

First

1 2 3 4 5 6 7 8 9

Last

Post Reply Subscribe

...

Page 4 of 12

First

1 2 3 4 5 6 7 8 9

Last