Are solvers missing key aspects in for optimal human strategies? A thought experiment *LONG* - Micro Stakes Pot Limit and No Limit

Two Plus Two Forums Poker Strategy Online No-Limit Hold’em Cash

Are solvers missing key aspects in for optimal human strategies? A thought experiment *LONG*

Post Reply Subscribe

...

12-04-2023 , 04:14 PM

_doc_

journeyman

Join Date: Oct 2022 Posts: 349

Would you play a coinflip game betting $10000 dollars if the opponent bet $10001? Or would you rather bet $0.1 to your opponents $1? Our robot overlords would pick the bigger game for the highest EV. Should we?

Let’s consider a toy game. We are playing against a robot that plays perfectly against us. We can either have nuts or air, and the robot always has a bluffcatcher, that beats air and loses to nuts. The pot is 1 and the effective stacks are infinite. We may check (give up) or bet any amount, and the robot can call or fold. Say we have nuts 10% of the time and air 90%.

We can adopt a pot sized betting strategy. We bet 1 with nuts. The bot will always fold, so 10% of the time, we will profit 1 making our EV per hand +0.1.
We can now balance this out with air. The bot is given odds 33%, so we need to be bluffing 33% of the time (or half a combo), meaning that we bluff our air hands 0.5/9 = 5.5% of the time. The bot is now indifferent to calling or folding and does both randomly. 15% of the time, we now make 1 profit on average, so our EV is now +0.15 per hand. Thats better.

Let’s now go ahead and adopt our overbet.

We bet 10 with nuts and want to balance it out. The bot is given odds 10/21 = 47.6%, so we can bluff up to this often. This corresponds to about 0.91 combos of bluffs to make the bot indifferent, so let’s bluff our air 0.91/9 = 10% of the time. Our EV is now +0.19 per hand. Great.

Here is our EV for different bet sizes with balanced bluffing:

So we can get infinitely close to 0.2 EV by increasing our betsize indefinitely, but we get very diminishing returns on our increased betsize after like 2-3x pot.

Now, solverland is a magical place full of infinities. With infinite bankrolls, infinite stacks and an infinite amount of rounds to play, the boring consequences of our own finite reality like variance and risk of ruin, evaporate like a morning mist. Why wouldn’t a solver bet a near infinite amount of money and balance it perfectly to obtain the full 0.2 EV?

We are not in solverland. We are slaves to our finite bankrolls and to our limited time to play hands. In our world, standard deviations and variance are very real and ever present.

So if we look at a single instance of this toy game, let’s calculate the variance for various bet sizes:

Our GTO opponent is calling 50% to remain unexploitable because they know our strategy. If our strategy were fixed in place, unchangeable; they could fold or call to every bet and it would make no difference to them. If they decided to call every hand, however, us humans would pay dearly in massively increased variance.
In order to determine the optimal strategy for a human player with more finite constraints, we need to somehow assign a monetary equivalent to a given variance. This is obviously a very difficult task given there are a myriad of factors to consider. Risk of ruin is a very concrete way of potentially assigning an overall monetary value of a given strategy with less variance over another. If strategy A has a RoR of 2% and strategy B has a RoR of 1%, B has, all other things being equal, a monetary excess value of 1% of the bankroll, not counting loss of potential future earnings from going bankrupt, which could be significant. This indicates that there is mathematically founded room to sacrifice EV in order to reduce variance in the real world - as long as the EV loss is within certain bounds and the EV is still positive.

Let’s try something more concrete. Our player Doug is a all-go-no-quits-big-nuts GTO impersonator playing 6max cash with a 3.5bb/100 winrate. Unafraid of variance, he takes every spot and doesn’t stop at 3x pot; he happily bluffs and valuebets any amount and therefore yields a std deviation of 115. According to primedope, his bank roll vs 5% RoR looks like this:

But what if he were just slightly more risk averse? He reels it in slightly and his std.dev drops to 110:

That is almost a 10% decrease in bankroll in bank roll requirement potentially allowing for earlier increase in stakes as well (assuming no significant EV loss, granted).

The world of finance is used to attempt to assign comparable values to investments of varying risk/reward ratios. What if we used one of their tools on our toy game?
The Sharpe ratio compares the difference between expected return on an investment and the risk free return (US bonds in finance, never bluffing in our toy game (0.1EV)) with the standard deviation of the investment. Higher ratios are better.
Let’s look at a table of sharpe ratios for the various bet sizes:

The sharpe ratio indicates that a bet size of between 1 and 2 has the highest risk adjusted return.

Back to us trying to beat our robot. In the end, it turns out that there in fact IS a way, in which us humans can achieve our much coveted low variance while still reaping near the maximum EV. No scared 1.5pot sharpe ratio nonsense for us. The answer is quite obvious if you think about it. The vast majority of our variance stems from the robot calling us beating our bluffs and paying our nuts. We will decide to use a huge bet sizing, say 100, so we are allowed near the maximum amount of bluffs, in this case 0.99 combos. However, instead of 0.99 combos of bluffs, we will only use 0.98. This will force the robot to fold every hand decreasing our variance astronomically, but it will only reduce our EV from 0.199 to 0.198. Now that’s what I call value. Solverland is now a place on Earth.

This concept of variance adjusted GTO overbets might be directly implementable in actual in-game situations in reg battles. It creates a sort of prisoners dilemma, where both players are actually interested in each other slightly underbluffing leading to 100% folds to overbet river shoves and thereby decreasing both players’ variance. Of course, the temptation of increasing the bluff ratio to reap that sweet 100% fold equity probably keeps this beautiful symbiotic relationship a pure fantasy.

I do believe though, that there is room for some sort of variance adjusted GTO strategies. It would be very valuable to not only see the EV of each action in a solver, but also the corresponding variance. A slight increase in EV may not be worth the play’s increase in variance, and it does seem like massive overbets have significantly diminishing returns in EV.

Discussion is welcome.

Quote

12-04-2023 , 05:30 PM

Kendoo

banned

Join Date: Jul 2021 Posts: 784

The first assumption is totally wrong - its not about the absolut number of 1 dollar vs 0,9 dollar but its 10001 : 10000 vs 10:1.
Everybody and every robot would choose the 10:1 spot.

Quote

12-08-2023 , 09:44 AM

bla

banned

Join Date: Sep 2006 Posts: 761

I was lost when we bet nuts the 2nd time.
Sorry.

Just comprise your story, please!

Quote

Post Reply Subscribe

...