Cepheus the "Unbeatable" bot is beatable - Page 2 - Poker Theory

This example doesn't work since your 4 bet range would always be AA I imagine a range PPs and suited connectors could be found for implied odds when starting with 7:1 pot odds while knowing the GTO players hand.

Even just folding very hand except AA to a 4 bet would make 4 betting AA less optimal than calling and just changing the percentage of the AA hands doesn't change that fact.

Quote

04-11-2015 , 08:22 AM

#27

bobf

grinder

Join Date: Feb 2008 Posts: 635

Quote:

Originally Posted by TakenItEasy

This isn't true. Against GTO, one obvious best response would be to adapt an identical strategy.

However testing a strategy against itself doesn't prove it was GTO because it would just have the same result as any other fixed strategy played against itself which would be break even.

It doesn't use itself as a best response. Itself is NOT a best-repsonse until the GTO is found. It finds an actual best response and the finding of that best-response is trivial compared to finding the GTO.

And where did you get the idea that any fixed strategy breaks even against a GTO? It doesn't.

Quote:

You may try to optimize or train a strategy by applying adaptive versions of itself against the previous iteration in order to keep making minor improvements until no further improvements could be found.

However, it doesn't prove GTO since you may end up finding a local maxima for a particular strategy or one that could be defeated by training a fundamentally different approach against it or design an extreme exploitative strategy that was divergent enough from the norm to be able to exploit the non-adaptive GTO style.

CFR doesn't get stuck in a local maximum. It finds the actual maximum and it knows when it found it by checking with the best-response.

You don't need to "train" anything to find the "extreme exploitative strategy". You simply calculate it.

Quote

04-11-2015 , 08:44 AM

#28

bobf

grinder

Join Date: Feb 2008 Posts: 635

Quote:

Originally Posted by TakenItEasy

I would have expected a 4-bet range of AA plus many other hands. A GTO solution requires you to call a 4-bet with many more hands than just AA otherwise ton's of hands would 4-bet bluff.

Quote

04-11-2015 , 12:49 PM

#29

samooth

veteran

Join Date: May 2009 Posts: 3,350

Quote:

Originally Posted by npiv

wtf is this thread?

I'm getting a bit tired of wasting mental energy on reading and interpreting these mental masturbations.

Quote

04-11-2015 , 02:39 PM

#30

TakenItEasy

old hand

Join Date: Sep 2004 Posts: 1,767

In theory, it seems to me that any non-adaptive strategy would eventually completely reveal itself after observing enough hands. Therefore it should be fair to optimize any exploitative strategy vs a known GTO strategy when proving GTO is our goal.

In effect the GTO player will broadcast all ranges on all streets and all post flop action for the entire post flop decision tree since all of this information would be available if the complete algorithm were known.

Given this information, It seems to me that the best approach would go something like this:

With perfect range information we could run equity calculations for the entire post flop decision tree without any loss of certainty.

Another words we could optimize exploitative GTO post-flop play. We only need to run an equity equation for 47 turns and 46 rivers and allowing for maybe 6 action combos per street and using 4 starting ranges it requires about a million equations per hand that should provide optimal exploitative post flop results for the given pre-flop ranges. So it seems well worth the trouble

47x46x6x6x6x4~1 or 2 Million equations to solve per hand.

fold equity can be calculated for precisely when folding ranges are known for all aggression on all streets.

Exact implied odds can be calculated.

the thinnest possible value bets can be calculated for known calling ranges.

Even calculate for possible floating situations given known ranges on turn/river runout combos.

Include folding and then choose the best EV.

For Pre-flop action:
I would start with 4 initial conditions for defining 4 pre-flop strategies.

Range A would be to passively call ATC for the cheapest possible flop.

The idea would Be to see just how far the perfect post flop play vs defined GTO can take us.

Range B would be opening with a loose range while capping aggression at 2 bets and making wide 2-3 bet calls with uncapped ranges.

Range C would cp aggression at 3bets and uncapped calls for 3-4 bets.

Range D would be a tight range capping at 4 bets and uncapped 4-5 bet calls.

Once we deal the flop for the predefined pre-flop action we can Run the million equations to get the optimal results for all post flop action.

Then we could then repeat this for a decent sample of flops. In order to get the accurate win-rates for the four ranges.

The total number of flop scenarios when including both hands would be:
C(52,2) * C(50,2) * C(48,3)
1326 * 1250 * 17296 ~ 28 Billion

Note that this is much smaller than all possible hands to the river so I would assume fewer hands are required for the same level of confidence but I never did bother to learn the precise way to calculate for this as I was never an online cash game player. maybe 30 minutes per iteration. Would give us results for a couple thousand hands. We only need to be accurate enough to establish the direction which tweaking the pre-flop conditions take us.

Next we tweak the four pre-flop ranges to get the next set of win rate estimations and repeat as needed.

We will only be changing starting ranges so the impact and trends should be clear.

We can increase the sample size around the optimums to get better win-rate estimates.

To run all 28 billion flops if we assume it takes a second to run the million or so equity calculations it would take a PC about a thousand years to finish, or take the NSA less than a second.

However I'm sure running around 10 K hands should be a decent sample size which others can hopefully verify. Remember it only involves variance to the flop, not the river.

Once any strategy is found to be long term winning, we would know it's not GTO and we could identify all strategies that may cross this line as well as by how much fairly accurately. While spending only a fraction of the CPU time for training the GTO strategy in the first place.

It's the advantage that those huge equity calculations provide us.

Last edited by TakenItEasy; 04-11-2015 at 02:49 PM.

Quote

04-11-2015 , 04:36 PM

#31

bobf

grinder

Join Date: Feb 2008 Posts: 635

Quote:

Originally Posted by TakenItEasy

Yup. It's easy to compute a best-response to any strategy. That's how the algorithm used to create cepheus knows when it's Found a close approximation to a Nash Equilibrium (A, B).

When
- Best-Response against B does almost no better than A does vs B
and
- Best-Response against A does almost no better than B does vs A.

Quote:

For Pre-flop action:
I would start with 4 initial conditions for defining 4 pre-flop strategies.

Range A would be to passively call ATC for the cheapest possible flop.

The idea would Be to see just how far the perfect post flop play vs defined GTO can take us.

Range B would be opening with a loose range while capping aggression at 2 bets and making wide 2-3 bet calls with uncapped ranges.

Range C would cp aggression at 3bets and uncapped calls for 3-4 bets.

Range D would be a tight range capping at 4 bets and uncapped 4-5 bet calls.

Once we deal the flop for the predefined pre-flop action we can Run the million equations to get the optimal results for all post flop action.

Then we could then repeat this for a decent sample of flops. In order to get the accurate win-rates for the four ranges.

Calculating best response is simpler than that. You don't need to deal with ranges. You can calculate the best action one hand at a time. You don't need to worry about being balanced because you are dealing with a non-adaptive opponent. Your range doesn't matter. Only your hand, board, betting, and stacks matter. And you can do better than perfect post-flop play, you can also play perfectly pre-flop against it.

Quote:

Once any strategy is found to be long term winning, we would know it's not GTO...

Right, but regret minimization algorithms find a strategy against which no strategy exists which is winning more than some tiny amount and it verifies that by checking the best-response.

Last edited by bobf; 04-11-2015 at 04:45 PM.

Quote

04-13-2015 , 05:06 AM

#32

Eu.Era

adept

Join Date: Jun 2012 Posts: 722

i just beat it twice and ive never played limit before

Quote

04-14-2015 , 02:00 AM

#33

ArtyMcFly

Carpal \'Tunnel

Join Date: Dec 2014 Posts: 13,256

Cliffs:
Cepheus played trillions of hands in order to support the thesis: "FLH is essentially weakly solved".
OP played 200 hands and thinks this is enough to disprove the thesis.
#lolsamplesize #variance

Quote

04-15-2015 , 07:58 AM

#34

+VLFBERH+T

grinder

Join Date: Mar 2013 Posts: 657

Quote:

Originally Posted by ArtyMcFly

But you forgot #dontarguewithagenius

Quote:

Originally Posted by Rich Checkmaker

Fyi I am quite literally, a genius.

Quote

04-16-2015 , 01:21 PM

#35

mme

old hand

Join Date: May 2009 Posts: 1,668

but ..what about an infinite number OPs ?

Quote

04-18-2015 , 09:27 AM

#36

punter11235

Carpal \'Tunnel

Join Date: Mar 2005 Posts: 8,210

Not only it's beatable but it's very easily beatable by anyone who is willing to put an effort. Here is the recipe:

-harvest the strategy from Cepheus' site
-calculate max exploit vs it (this is fast and can be done on the fly for every flop)
-...
-profit 1mb/hand

The problem is that strategy is probably exploitable for way more than 1mb/hand... maybe it's 100mb/hand. Then Cepheus only needs to play max exploit vs max exploit vs it randomy in 1% of hands. Then you can adjust... and it looks like we are back to playing poker here.

Last edited by punter11235; 04-18-2015 at 09:32 AM.

Quote

04-19-2015 , 07:20 AM

#37

lolgusaments

enthusiast

Join Date: Jan 2015 Posts: 52

Quote:

Originally Posted by Jamsym2

Yeah you got this buddy.

200 hands is a bit low to decide if you have an edge though.

Get a real sample like 500 hands and then we can tell for sure.

Gl

my favourite post of '15

Quote

05-04-2015 , 10:12 PM

#38

Rich Checkmaker

banned

Join Date: Nov 2014 Posts: 2,441

FYI by genius i meant more like idiot-savant. And by idiot-savant i mean more like just plain idiot. I've decided that I have better things to do with my time and I concede that Cepheus is probably not beatable for more than 1bb/100, though I'm not certain, the point is I will be spending my time playing for real money instead.

Quote

05-11-2015 , 03:58 PM

#39

knircky

veteran

Join Date: Jan 2010 Posts: 2,097

I think 4betting in Limit makes no sense because there can never be any fe. As such we can only 4bet with a value range, which then becomes exploitable so it must be better to include those in the calling 3bet range.

this however is a very interesting and valid question.

But i think there are many things that we consider standard that are simply wrong.

Quote

05-12-2015 , 04:51 AM

#40

CanadaPete

grinder

Join Date: Jul 2014 Posts: 525

Quote:

Originally Posted by Rich Checkmaker

Yeah I mean basically you are better off just accepting what they say as true and trying to beat real money ganmes. Limit Holdem not really the best place to try these days.

Quote

04-26-2024 , 02:06 AM

#41

CactusCub

enthusiast

Join Date: May 2020 Posts: 65

Quote:

Originally Posted by Rich Checkmaker

Ok I first read about Cepheus the so-called "unbeatable" poker bot back in January when it first came out. I was interested and thought it could be possible to GTO solve hu limit holdem..... maybe. But then I read about the way that these non poker playing computer scientists supposedly solved the game. They had the computer making random decisions and comparing and evaluating the different values of folding, check/calling, bet/raising against another computer making random decisions. The problem with this method that became readily apparent to me is that the computer is going to assign more value to betting raising and calling and less value to folding because the random computer its playing against is going to be folding the nuts sometimes randomly to bluffs.

Now I could see this was going to be flawed so I decided to sit back and wait for someone else to debunk it. Three months later and the only articles I can find online just take the creators claims as true. (If there is someone else whoes disproved it please let me know). I even read David Sklansky saying that even though he couldnt beat it, that it wouldnt extract as much money from a bad player as he would. Which would be true of a GTO solution, but why is everyone so accepting that Cepheus is unbeatable?? Deep blue beat Gary Kasparov. Who has Cepheus beat?

I played Cepheus last night for 200 hands to test my theory out. I won 64.5 big blinds in the first 100 hands and 27.5 big blinds in the second hundred. The program played very poorly in my estimation. The last hand I played basically sums up how bad Cepheus plays. I have A2o in the SB/button. I raise to 20 and Cepheus calls 10 from the bb. Flop A85. Cepheus checks I bet 10 Cepheus calls. Turn 7 Cepheus checks I bet 20 Cepheus calls. River 6 Cepheus checks I bet 20 Cepheus calls with J5 and loses. I bet every street and Cepheus called down every street with bottom pair jack kicker. Horrendous. This is GTO play? A four card straight and ace on board and this program calls on the river with bottom pair??

I plan to play a session of at least 200 hands later and record and upload the video to debunk this bot. I understand that winning 92 big blinds over 200 hands doesnt mean I've debunked it yet, but after how many hands would it be undeniable? I feel confident that I'm right and I'd even be willing to bet 80% of my bankroll I could beat Cepheus long term. I'd be interested to hear what other people think and I'd really like to hear results of your match with the bot. Keeping in mind I think it overvalues betting and calling and undervalues folding so dont bluff and bet middle pair and bottom pair for value all streets.

Also looking at the bots preflop play. How is it GTO preflop play to call 100% of the time after raising from the sb and being reraised by the big blind. With the top of your range I think its pretty obvious the GTO play is to raise back again not just call 100%. At least some calls and raises but to just call in position when you could put another bet in preflop with Aces is just wrong.

KICK THIS BOTS ASS!!!

This post did not age well