hand vs. cepheus. let's try to understand this one - Page 2 - Medium Stakes Poker Forum

Playing on my cell so no hh

I’m bu/sb w 22 no diamond

I raise, C 3b, I call (I’m capping about 1/4 the time w any pair <7s, 3/4 for 7s and up)

Flop was Qd9d4x. C bets I call.

Turn Ad. C checks. I’ve found this is polarizing. Either he has a killer draw, a super strong hand, or he’s in k/c mode (much of which I beat). Since I have no d and an easy fold to a raise, and some value to get, I bet. C calls

River was an off suit 5. C checks. I??

Link on where to play? Trust the website?

Quote

12-24-2017 , 06:17 PM

#28

UpHillBothWays

adept

Join Date: Nov 2013 Posts: 978

Quote:

Originally Posted by Clumsy Surgeon

Link on where to play? Trust the website?

http://poker-play.srv.ualberta.ca

Quote

12-24-2017 , 07:15 PM

#29

Clumsy Surgeon

enthusiast

Join Date: Jun 2017 Posts: 54

Thank you.

Quote

12-26-2017 , 01:10 PM

#30

DeathDonkey

Carpal \'Tunnel

Join Date: Feb 2004 Posts: 12,875

Up2ng: in the other thread I agreed with you but now I think you are mistaken. What you are describing is a Nash equilibrium given a range - this is what the expensive online solver tools use. Or what the MoP would call the “nemesis” strategy. This is not how Cepheus works. I mean it kinda is, but only in the sense that it is the nemesis for the entire game tree. It is opponent range independent.

To see why this has to be so imagine you are playing heads up vs one guy who has never played poker before. On the end he calls 100% of the time just to see what you have. Another guy walks up and he gets to the river, checks to see if he made a royal flush, and if not he folds. And now a third guy but he plays pretty well, some 2+2 schooled pro. And now you take turns playing all 3 of them but each hand you don’t know which guy you are playing. How are you going to analyze their river calling range?

Cepheus doesn’t care which of those guys he plays. He will beat them all, albeit he might have a lower win rate against the two extreme guys than you or I would because we would quickly adjust our ranges. If we had piosolver with us and put in a 100% calling range for one guy on the river it would also spit out a “nash Equilibrium” with zero bluffs and a value range that is a 50.001% favorite vs his entire range. This is just a bastardization of a Nash equilibrium because the guys strategy is so silly.

Uphill: counterfactual regret minimization is an algorithm to traverse the game tree quickly and efficiently. It is an approximation of gto when used on the entire game tree. And an approximation of a Nash equilibrium when used on a smaller tree. For that tree.

Quote

12-26-2017 , 03:38 PM

#31

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

I have this loose theory that there is a difference between gto and nash equilibrium:

nash equilibrium is only satisfied when two or more players are playing the nash equilibrium strategies, which goes all the way back to preflop.

gto strategies are only satisfied when two or more players are playing the comaximally exploitive strategies.

This may seem redundant, but here's the difference:

gto may be satisfied by a player individually by such a player that takes the comaximally exploitive line given any set of previous circumstances in a hand of poker; this includes preflop errors.

to show how this is true, think about what solvers do:

they figure out an approximation of a minimax strategy given information that may or may not satisfy the conditions of a nash equilibrium. This includes preflop errors.

For example:

If I knew the nash equilibrium preflop range, but included some hands that were not in that range in my preflop raising range and plugged them into a solver, the solver would produce a minimax strategy that is not strategically symmetrical with that of the nash equilibrium.

Thus while nash equilibrium strategy is the strategy that maximally exploits the opposing nash equilibrium strategy, it is not necessarily the gto strategy, which maximally exploits the maximally exploitive counter strategy, given any set of previous information, including preflop errors.

Quote

12-26-2017 , 03:43 PM

#32

SitandSpin

banned

Join Date: Sep 2017 Posts: 345

Optimal play is to stop feeding the bots and AI scientists information that will inevitably lead towards the destruction of our game.

People who play in the AI matches, like the four huNL players who got crushed by the AI, are poker Uncle Toms.

butnah, it's fun in the short term to talk about math so who cares if we're willingly walking the path towards the destruction of our livelihoods ...

Quote

12-26-2017 , 03:50 PM

#33

SitandSpin

banned

Join Date: Sep 2017 Posts: 345

Whoever shuts down the University of Alberta AI department deserves to be Time's Person of the Year.

Quote

12-26-2017 , 03:53 PM

#34

SitandSpin

banned

Join Date: Sep 2017 Posts: 345

Let's go talk to some checkers players about how their game is thriving after it got solved ...

****ing lemmings

Quote

12-26-2017 , 04:47 PM

#35

DeathDonkey

Carpal \'Tunnel

Join Date: Feb 2004 Posts: 12,875

Lol you can’t stop progress. You can only learn to use it to your advantage.

Quote

12-26-2017 , 04:57 PM

#36

SitandSpin

banned

Join Date: Sep 2017 Posts: 345

Have you ever considered why there's essentially zero online gambling on chess or backgammon?

Engines exist for these games that play better than any human.

What, pray tell, do you think is going to happen to online poker when all the solvers and engines are available to everybody and everybody can play GTOish or whatever?

But no, Uncle Tom, keep on keepin' on. In five years the highest poker games online will be microstakes.

Of course, these AI *******s COULD easily invent a new game with which to test their tech, with the two added bonuses of:

- you won't be aiding the destruction of 1000s of people's livelihoods
- you can make the game exactly as simple or complex as you need

but, you know, that would get less intere$t from random nerds, which indirectly would lead to a lo$$ of funding for the University of Alberta Terminators and other traitors of mankind, so that's obviously impossible.

Quote

12-26-2017 , 05:03 PM

#37

DeathDonkey

Carpal \'Tunnel

Join Date: Feb 2004 Posts: 12,875

I think you’re missing my point. I agree online poker is basically done for in a few years. But there isn’t anything you or I can do about that except try to crush until it happens. If those guys didn’t participate in the AI heads up match the algorithm still would have worked the same. Poker (particularly limit Holdem) is just a fairly simple game and it’s no surprise we are getting to the point where it can be solved by an iPhone.

Quote

12-26-2017 , 05:09 PM

#38

SitandSpin

banned

Join Date: Sep 2017 Posts: 345

Quote:

Originally Posted by DeathDonkey

I agree that my tilting in this forum isn't going to STOP these *******s from destroying something that I love. I get that. I'm just trying to stem the tide as much as one guy can. I know I'm tilting at windmills.

It's just, I've seen this story play out in other games with my own two goddamn eyes. These people, flat out, do not know or do not care that they're indirectly ripping countless hundreds of thousands of dollars from the hands of poker players and other game players. They are the enemy.

Quote

12-26-2017 , 05:16 PM

#39

SitandSpin

banned

Join Date: Sep 2017 Posts: 345

Quote:

Originally Posted by DeathDonkey

But there isn’t anything you or I can do about that except try to crush until it happens.

You know, though, I do think there is actually something we, as a poker-playing populace, COULD do about this if we were to team up. I do think that a mass-scale public outcry could at least make these guys HESITATE to continue their mission to destroy strategy games.

Sadly, most poker regs are QUANTS themselves and love this ****. The poker ecosystem badly needs more left-brain nerds such as yours truly. We're able to see the forest.

Quote

12-26-2017 , 05:27 PM

#40

DeathDonkey

Carpal \'Tunnel

Join Date: Feb 2004 Posts: 12,875

I hear you and I don’t think you are wrong. On some level everyone who posts on this site is ok with sharing info and making the games tougher. You are somewhat of an exception tbh.

I guess I look at it like new technology is sometimes invented and sometimes discovered. Figuring out Poker is more of a discovery of the right way to task a computer with the problem. And it was going to get solved regardless. At least academics publish their work so people like me can tinker with CFR algorithms on my laptop. Otherwise a few unscrupulous programmers would / will make millions running bots online and nobody would be the wiser.

Also it’s not all peaches and cream being an academic these days. Their world is crumbling too

Quote

12-26-2017 , 09:14 PM

#41

phunkphish

veteran

Join Date: Aug 2011 Posts: 2,902

Quote:

Originally Posted by Bob148

Quote

12-26-2017 , 11:19 PM

#42

Montrealcorp

Carpal \'Tunnel

Join Date: May 2007 Posts: 13,356

Quote:

Originally Posted by SitandSpin

Have you ever considered :

Of course, these AI *******s COULD easily invent a new game with which to test their tech, with the two added bonuses of:

- you won't be aiding the destruction of 1000s of people's livelihoods
- you can make the game exactly as simple or complex as you need.

You do realize with AI , in the next 15-20 years , will be responsible for the disparition of like 20/30% of the jobs that exist today ?

I think they really don’t care on how it affects the online poker economy.
They got bigger fish to fry !

Quote

12-26-2017 , 11:34 PM

#43

Bob148

Carpal \'Tunnel

Join Date: May 2012 Posts: 11,972

https://forumserver.twoplustwo.com/1...brium-1699669/

I turned my other post in this thread into a theory forum post for anyone interested.

Quote

12-27-2017 , 02:06 AM

#44

up2ng

enthusiast

Join Date: May 2014 Posts: 54

Hey DeathDonkey, great post above (#30).

I'm not sure what I've said in this thread that contradicts any of that, I think that we are on the same page. I never meant to imply that Cepheus behaves at all like the nemesis, or that it adjusts its play in any way based on opponent tendencies -- I don't think that it works that way either.

The point that I was trying to make is that, particularly when checked to on the river, Cepheus will calculate which hands it can bet "for value" and it will include those in its river betting range. THEN, it will balance this range by adding an appropriate number of bluffs. NOT the other way around. Its algorithm for determining "for value", by definition, will be based on what I've said earlier in this thread -- it will compare its current hand against its estimation of its opponent's range, and if it beats enough of that range it will bet for value. In this particular decision (checked to on the river) it will not factor in at all where it is within its own range, the board texture, how many bluffs it "wants" to have, etc.

Perhaps the confusion was that I then tried to give an example (in this thread or perhaps in the other thread) of what WE might want to do while playing exploitatively which is to actually try to estimate our opponent's range based on their tendencies -- like, if we know with certainty that our opponent only checks the river with 2 pair or better and with 6-hi or worse (and always folds to a bet while holding 6-hi or worse), then it's illogical for us to bet one pair on the river no matter what GTO says.

However, I agree that Cepheus does NOT play this way. It does not try to estimate our range based on OUR tendencies -- instead, (and this is now speculation as I'm not an expert on how this bot actually works) it will estimate our range based on how it thinks we should have played the hand up to this point (using its own GTO solutions) . . . in other words, if Cepheus was sitting in our seat (playing against itself) and using the same line as we've used, any hand that it would still have in its own range at that point becomes included in its estimation of our range for that decision point -- and then it bets for value or as a bluff or checks behind accordingly. But this is still a hand vs opponent's range algorithm. When you think about it, it has to be this way, otherwise it would miss value or it would spew, neither of which are GTO.

Quote

12-27-2017 , 03:23 AM

#45

justathought-

enthusiast

Join Date: Feb 2012 Posts: 70

Quote:

You do realize with AI , in the next 15-20 years , will be responsible for the disparition of like 20/30% of the jobs that exist today ?

gtfo solution?

Quote

12-27-2017 , 09:26 AM

#46

just_grindin

Pooh-Bah

Join Date: Dec 2007 Posts: 5,263

@up2ng

I am not intimately familiar with Cepheus' programming, but from my understanding of the poker AI that uses CFR algorithms there is no range estimation during play.

Cepheus just found the best action against itself traversing most of the nodes in the game tree and reducing the amount of regret it had along the way (i.e. it would change it's action slightly if another action yielded better results). Based on this method it determines what hands to bet, check, call, raise, and fold and with what frequency to take that action with a particular hand.

I imagine while it's playing humans or in competition it just has a lookup table to determine which action to take. At that point my guess would be it's just taking as input the hand, board, and actions and spitting out bet, check, call, raise, fold etc.

Quote

12-27-2017 , 12:22 PM

#47

UpHillBothWays

adept

Join Date: Nov 2013 Posts: 978

Quote:

Originally Posted by DeathDonkey

Uphill: counterfactual regret minimization is an algorithm to traverse the game tree quickly and efficiently. It is an approximation of gto when used on the entire game tree. And an approximation of a Nash equilibrium when used on a smaller tree. For that tree.

ty dd. i got the 20k foot view from the paper and your explanation aligns with what i took away and wrote above as well (and your explanation is more detailed).

sitandspin, the fact that you bring up "gambling on chess and backgammon" in a discussion of gambling on poker wrt computer advancements is a red flag regarding your understanding of the subject matter at hand.

Quote

12-27-2017 , 12:42 PM

#48

DeathDonkey

Carpal \'Tunnel

Join Date: Feb 2004 Posts: 12,875

Quote:

Originally Posted by just_grindin

Right, this is my understanding too, it isn’t doing anything we would call “calculating” during a hand. It’s actually doing something more simple it’s just never been done before since it was too large a problem. It’s sorta just tossing a strategy out there (guess and check), then it calculates its regret at each point and adjusts, as you said. What it ends up with could look to the untrained eye like a sort of haphazard strategy that would suffer from all sorts of imbalances, but it simply isn’t. It’s so close to the gto strategy as to be indistinguishable, and for each board and action it happens to have a well balanced range.

Quote

12-27-2017 , 12:44 PM

#49

UpHillBothWays

adept

Join Date: Nov 2013 Posts: 978

up-> i think the issue is here:

Quote:

Originally Posted by up2ng

The point that I was trying to make is that, particularly when checked to on the river, Cepheus will calculate which hands it can bet "for value" and it will include those in its river betting range. THEN, it will balance this range by adding an appropriate number of bluffs. NOT the other way around. Its algorithm for determining "for value", by definition, will be based on what I've said earlier in this thread -- it will compare its current hand against its estimation of its opponent's range, and if it beats enough of that range it will bet for value. In this particular decision (checked to on the river) it will not factor in at all where it is within its own range, the board texture, how many bluffs it "wants" to have, etc.

you kinda touch on it later, but to be clear, cepheus determines this ranges EXACTLY as if it were playing itself. not "kinda" and it doesn't estimate its opponents' ranges.

Quote

12-27-2017 , 12:45 PM

#50

UpHillBothWays

adept

Join Date: Nov 2013 Posts: 978

Quote:

Originally Posted by DeathDonkey

yes, the ACTUAL calculation (training of the ML algorithm) of all the nodes of the tree took 68.5 days.

Quote:

It’s sorta just tossing a strategy out there (guess and check), then it calculates its regret at each point and adjusts, as you said. What it ends up with could look to the untrained eye like a sort of haphazard strategy that would suffer from all sorts of imbalances, but it simply isn’t. It’s so close to the gto strategy as to be indistinguishable, and for each board and action it happens to have a well balanced range.

it's statistically indistinguishable from a nash equilibrium over 10^14 games.

Quote

Page 2 of 4

First

1 2 3 4

Last

Post Reply Subscribe

...

Page 2 of 4

First

1 2 3 4

Last