CoTW: Why all-in-EV is a horrible measure of overall luck - Micro Stakes Pot Limit and No Limit

man: “we’ve been over this area several times and your keys are not here. Where did you lose them?"
drunk: “I dropped them over there (pointing to his car in the distance)."
man: “why are you looking for your keys over here if you know you lost them over by your car?”
drunk: “because the light is good here and it is dark over there by the car.”

tldr;

The problem with all-in-EV is not that it is a bad statistic but that folks treat all-in-EV as if it is a measure of their overal poker luck. Like the blind man who feels the elephant’s tail and declares that the elephant is like a rope, you look at your all-in-EV graph and declare that your luck has been good / bad / neutral when there are so many aspects of poker that are driven by luck that are not measured by all-in-EV.

* All-in-EV defined *

The Holdem Manager FAQ describes all-in-EV thus:

Quote:

Originally Posted by HEM FAQ

Overview:
What is EV and how does it work?

Answer:
EV stands for Expected Value and is a mathematical formula based on when a hand goes all-in before the river.

An EV line in a graph is the same stat that you see in the reports section called $ (EV adjusted).

So how do we interpret the line to see if we're running hot or cold? We'll define a few stats first.

EV $ Diff (found in the Hands Tab)

This is calculated by taking your equity% of the total pot when you go all in and comparing that to what you actually won. So let's say I go all in with AA on an Axx flush draw flop against a flush draw. I'm 80.5% to win with my set in a $400 pot.

80.5% x $400 = $322 so what it's saying is on average that I should win approximately $322 on average when I go all in with a set of Aces here vs. his flush draw.

If I win, I would win $400, so I've won $78 more than I would on average i.e running good/okay.

If I lose, I win $0, so I'm running $322 worse off than I would on average so im running bad.

And PokerTracker3 support saying the same thing:

Quote:

Originally Posted by PT3 support

For all hands that were all-in preflop, on the flop, or the turn, instead of seeing your actual results graphed you see your expected results at the time of the all-in.

* All-in-EV in the context of your whole game *

Here are some stats from my PT3 DB for this year:

107,716 hands played (all 50 NL Rush)
hands that qualify for all-in-EV formula: 389: result: +1,416 bb above EV (expected -316 bb) (yeah, running good in All-in-EV)
non all-in-EV hands where hero VPIP: 15,244: result +13,733 bb

By all-in-EV standards I am running hot but when you look at the bigger picture the all-in-EV hands are a small part of the final results. I could be running super hot or super cold in the non-all-in-EV hands. One thing is certain: the all-in-EV stats tell us nothing about what is happening in non-all-in-EV hands.

*All-in-EV misses many other luck factors

In poker, the normal distribution turns up with a fractal like regularity. For every situation that occurs according to probability there is a corresponding normal distribution. all-in-EV measures how far your results are from the mean for exactly one of these situation.

Here is a list of aspects (situations) of no limit hold’em where there is an expected value and normal distribution that is not measured by all-in-EV. The point of this exercise is to see how much bigger (qualitatively, if not quantitatively) the elephant of poker luck is than the “elephant tail” of all-in-EV that folks like to use as their main barometer of luck.

How often are you dealt AA, KK? There is a normal distribution for this.
How often do your pocket pair flop sets? Normal distribution.
How often do you get AA vs KK and vice versa?
When you do get AA and villain has KK, how often is villain an agressive fish? How often is he a nit?
ditto for KK vs AA: what is your distribution of villains here?
you bet/bet/shove when you flop a set against a drooler and he calls (it doesn’t matter the result -- all-in-EV ignores this because the money went in on the river.)
oh so many more ...

Please add interesting examples of non-all-in-EV poker luck to this list.

* Distribution of nits and maniacs *

Consider the following situation:

100bb effective stacks

Scenario 1
UTG, who is a confirmed full ring nit (e.g. xxvinniepoohxx) raises to 4x.
it is folded around to you in the BTN and you have KK (or QQ) and raise it to 12x, nit villain pops it up to 36x, you shove villain calls. Villain has AA (are we surprised?) and it holds up.

I imagine many of you reading the hand history above are thinking “there are several nits that I could get away from KK,QQ when they show strength” or "I would never 3bet the nit's UTG open raise". In any case, a competent player with KK-QQ HU against an UTG nit has a good chance of figuring out where he is at.

Scenario 2
Now consider the same hand but UTG is a maniac (60/50/5) with a PF shoving range of TT+, AQ+. You stack off PF (as you should) and villain shows AA (damn, the top of his range)

When you have KK and villain has AA, how often is it scenario 1 (better for you since you can get away) vs scenario 2 (worse because you have to stack off behind)? This is another normal distribution.
You could be running really bad in that:

most of the times you have AA and villain has KK, villain is a nit who never gets stacks in with less than a set
most of the times you get KK vs AA, villain is a loose drooler you feel compelled stacking off to

but all-in-EV does not measure this kind of luck.

Sure, you flopped sets slightly more often than expected over the last 200k hands but it happened way more often when villian was a set mining nit agains whom you are never getting any money after flop from unless they have a set or better and you rarely made a set against a maniac...

The distribution not just of hands (how often did I get a premium pocket pair, how often did I hit my set, how often did I hit my draw) but of villains and coolers has a huge impact on results and all-in-EV measures none of this.

You could run 1 standard deviation above all-in-EV for your entire life and 2 standard deviations below expectation in all the forms of luck that EV does not measure and, if you thought all-in-EV was the beginning and end of luck you would think you were running hot but suck at poker.

----

* Exercises for the student *

Problem 1

Let us call:

luck-a: all-in-EV
luck-b: all the other luck in poker (flopping sets, getting good hands against droolers, having droolers catch decent hands against your monsters, ...)

given 1000 players how many players will be both:

> 1 standard deviation in luck-a
< -1 standard deviation in luck-b

over a reasonably large sample of hands (e.g. 1 million)?

Problem 2:

Given luck-a and luck-b definitions above ...

If you could run -1/2 standard deviation below EV for one of the types of luck and +1 standard deviation above EV for the other type of luck for the rest of your life, which would you chose to run hot in, luck-a or luck-b?

------------------------

Meh, I'm tired of editing this and am afraid I might click the wrong button and lose everything so here we go ...

Last edited by Max; 08-15-2021 at 12:07 AM.

Quote

07-06-2010 , 09:38 PM

Y2Dennis

old hand

Join Date: Mar 2007 Posts: 1,268

First! Not a math guy but this looks awesome. Reading now.

Quote

07-06-2010 , 09:43 PM

spadebidder

Actually Shows Proof

Join Date: Aug 2008 Posts: 7,905

It's even worse than you describe. Both programs allow the inclusion of post-flop all-ins, for which the accuracy of the equity calculation is a lot less than preflop with a single caller (making no further decisions possible). Post-flop the equity calculation is corrupted by player decisions.

That said, in NL games it can be useful to know how much equity you gained/lost in all-in hands, as long as you look at it in the proper context. Unfortunately most players misunderstand it, as you have pointed out.

Quote

07-06-2010 , 09:47 PM

funkyj

Carpal \'Tunnel

Join Date: Jun 2008 Posts: 6,416

Quote:

Originally Posted by spadebidder

Please post a sample hand history (real or made up) to illustrate your point. Are you talking about the street-by-street EV vs all-in-EV debate or something else?

Quote

07-06-2010 , 10:13 PM

venice10

Referee

Join Date: Nov 2007 Posts: 25,852

Quote:

Originally Posted by funkyj

Scenario 1
UTG, who is a confirmed full ring nit (e.g. xxvinniepoohxx) raises to 4x.
it is folded around to you in the BTN and you have KK (or QQ) and raise it to 12x, nit villain pops it up to 36x, you shove villain calls. Villain has AA (are we surprised?) and it holds up.

If they have history with xxvinniexx, not at all if you're holding KK. Frankly with QQ, I'd fold to his raise.

Very good post overall. You can't do a damn thing about running bad. However in the micros, you've got so many leaks that if you fixed them, you can't lose over the medium term.

Quote

07-06-2010 , 10:25 PM

markdirt

veteran

Join Date: Aug 2007 Posts: 2,526

great post, i think most people put way too much stock in AIEV. a lot of players like to use it as an excuse to lose, but when they're running hot they shrug cause "it's finally my time to win."

it is very difficult to evaluate your luck in poker because the game is just too complex.

i'm horrible at math but i'd like to see someone walk through the problems OP posted. i don't even know what he's asking lol but it seems interesting.

Quote

07-06-2010 , 10:26 PM

spadebidder

Actually Shows Proof

Join Date: Aug 2008 Posts: 7,905

Quote:

Originally Posted by funkyj

Please post a sample hand history (real or made up) to illustrate your point. Are you talking about the street-by-street EV vs all-in-EV debate or something else?

It's the same idea.

The card removal effects preflop are small and equity calculations are not off enough to worry about. Postflop they are magnified, as players have seen three more cards to base decisions on, and folded hands are not random. The deck stub becomes biased much more than it does from preflop card removal effects, which tend to mostly even out over time.

Preflop here's an analysis of the card removal effect on flop ranks, for about 400 million hands:
http://www.spadebidder.com/flop-analysis/part2/

and here's an extreme example:
http://www.spadebidder.com/statistic...-claim-part-2/
(see part 1 too)

Post-flop the biases are much greater when multiple players see the flop and then some of them fold before showdown. The only truly accurate way to measure all-in EV is to filter for when the all-in is preflop and there is only one caller (sawflop = 2). And that isn't even 100% accurate, but it's close enough.

While I'm on the subject, the inaccuracies of the commonly used software are made even worse when they use BBs or chip EV instead of pure average equity vs wins+1/2ties. Bet sizes are not random. Some players might tend to bet more when ahead, or when behind, or anything you can think of. But it isn't random. It's fine to measure that, but know that it isn't a luck measurement. It's part luck and part betting style. Leave the chips out of it and just compare average all-in equity to hands won. Then you have a luck measurement, albeit for one limited part of the game, as you have pointed out.

Last edited by spadebidder; 07-06-2010 at 10:39 PM.

Quote

07-06-2010 , 11:03 PM

eilno2

grinder

Join Date: Jan 2010 Posts: 420

+1.

now i can say i run bad when i run above all-in ev.

Quote

07-07-2010 , 12:21 AM

SammyG-SD

Carpal \'Tunnel

Join Date: Aug 2007 Posts: 12,600

may be my favorite COTW eva! (based on the quality of the post, not the topic per se).

Quote

07-07-2010 , 12:53 AM

#10

pokerarb

Carpal \'Tunnel

Join Date: Jun 2009 Posts: 6,185

goot post, tyvm

Quote

07-07-2010 , 01:01 AM

#11

BSNxBSN

centurion

Join Date: Jun 2010 Posts: 145

Stated in a great way. Well put man. You turned a "blehh" subject into a great read.

Quote

07-07-2010 , 01:44 AM

#12

zachvac

Carpal \'Tunnel

Join Date: Apr 2008 Posts: 13,775

haven't read all the way through yet so apologize if I'm saying something you already addressed but didn't see it skimming but would you disagree with my statement that all-in ev line is a better indication of your expected money won than your money won line? If you are judging 2 people to see which one is better you should use the ev not the money if you have both.

Quote

07-07-2010 , 01:50 AM

#13

mborg23

adept

Join Date: Sep 2009 Posts: 1,050

I was actually just talking about this concept with a good friend of mine. I say AIEV is trash for the very reasons you listed.

Very good CotW

Problem 1: I am not sure how to work that out exactly. I think the question is a bit vague.

Problem 2: Obviously I would take luck-b run good any day.

Quote

07-07-2010 , 01:51 AM

#14

funkyj

Carpal \'Tunnel

Join Date: Jun 2008 Posts: 6,416

Quote:

Originally Posted by spadebidder

Post-flop the biases are much greater when multiple players see the flop and then some of them fold before showdown. The only truly accurate way to measure all-in EV is to filter for when the all-in is preflop and there is only one caller (sawflop = 2). And that isn't even 100% accurate, but it's close enough.

SBSEV (street-by-street EV) would be far superior to AIEV if not for the pesky fact that you need complete hole card information for everyone who sees the flop, if not for everyone who VPIPs. (Presumably SBSEV == Sklansky Bucks).

As you (spadebidder) mention, if more than 2 people see the flop and some but not all go to showdown then important EV factors are ignored.

AIEV vs SBSEV is a case of looking for our keys where the light is good.

G-bucks are a great concept but they require us to guess villain's strategy and this can not be coded up in a simple formula like AIEV can (or like SBSEV could be if we had complete hole card info).

Here is a query to mpethy: What proportion of the average leak finder client's list of 3 biggest leaks identified in a leak finder session are leaks involving all-in-EV type hands?

Quote

07-07-2010 , 01:51 AM

#15

mborg23

adept

Join Date: Sep 2009 Posts: 1,050

Quote:

Originally Posted by zachvac

This could not be any more inaccurate. And the reasons why are explained in the OP.

Quote

07-07-2010 , 02:05 AM

#16

funkyj

Carpal \'Tunnel

Join Date: Jun 2008 Posts: 6,416

Quote:

Originally Posted by mborg23

I was actually just talking about this concept with a good friend of mine. I say AIEV is trash for the very reasons you listed.

Or, to use the CoTW analogy, we get a very good picture of the elephants tail (or perhaps something as large as a leg or a head).

Quote:

Problem 1: I am not sure how to work that out exactly. I think the question is a bit vague.

I'm not sure I can make it more clear. What sort of information do you think a person needs to have the question well defined?

If we consider only luck-a (or only luck-b) then we expect a normal distribution with ~158.7 folks (out of 1000) having a result that is 1 standard deviation or more above EV and the same number having a results 1 standard deviation or more below EV. 682.6 people are expected to be within +-1 stddev.

As defined, luck-a and luck-b are mutually exclusive i.e. independent statistical events.

Last edited by funkyj; 07-07-2010 at 02:10 AM.

Quote

07-07-2010 , 02:11 AM

#17

mborg23

adept

Join Date: Sep 2009 Posts: 1,050

Quote:

Originally Posted by funkyj

I'm not sure I can make it more clear. What sort of information do you think a person needs to have the question well defined?
.

Problem resides with me, not you. I apologize. My math is slightly more than rusty in some spots and I have never tried to learn standard deviation. My apologies.

Quote

07-07-2010 , 02:42 AM

#18

funkyj

Carpal \'Tunnel

Join Date: Jun 2008 Posts: 6,416

Quote:

Originally Posted by mborg23

Problem resides with me, not you. I apologize. My math is slightly more than rusty in some spots and I have never tried to learn standard deviation. My apologies.

No problem. I hope the last round of hints help put it within reach. The normal distribution picture in the original post should help remind folks of the percentages involved.

Quote

07-07-2010 , 02:43 AM

#19

zachvac

Carpal \'Tunnel

Join Date: Apr 2008 Posts: 13,775

Quote:

Originally Posted by mborg23

This could not be any more inaccurate. And the reasons why are explained in the OP.

No it's not and it's a super-common misconception among a lot of people. If you think that money won is a better indication than the all-in ev line is of expected winnings you have a super flawed understanding of what luck means and what all-in ev is. I have since read the entire post and it doesn't refute my point at all. Money won uses 0% of luck. Even if all-in ev incorporates 1% of luck incorporating that 1% will still be more accurate than only using the 0% of money won.

This of course all is ignoring card removal which may be a flaw in all-in ev.

edit: btw not criticizing the OP in any way and I would assume that OP agrees with me? I agree all-in ev is a small amount of luck but it's the only fully quantifiable luck and quantifying 1% > 0%

Quote

07-07-2010 , 03:10 AM

#20

funkyj

Carpal \'Tunnel

Join Date: Jun 2008 Posts: 6,416

Quote:

Originally Posted by zachvac

While it is better to compare an EV adjusted graph than a pure results graph, comparing the two player graphs is a pretty awful way to judge the skill level of the two players unless the sample size is really really big.

I'm told that PT3 (and HEM's) "standard deviation/100" statistic converged much faster than WR and can used with some confidence.

My stddev/100 over my 100k sample is 33.61 ptbb/100 hands. This translates to a actual WR that is +-2 ptbb / 100 from my true WR (95% confidence interval).

according to PTR, fareed (who crushed 50NL IMO) had a WR of 4.4bb (2.2ptbb) over 181k hands at 50NL. Assuming my stddev of 67.22bb (33.61ptbb) for two players, one with a true WR of 4.5bb/100 and the other with true WR of 3.5bb/100, most of the time the 4.5 player will have a better graph over 100k hands but a good portion of the time the 3.5bb player will have a better graph.

One thing that might be interesting to see is an AIEV adjusted standard deviation. Comparing the regular stddev with an AIEV adjusted stddev (if such a thing is possible) might give an idea of the relative sizes of AIEV luck vs all the other luck.

Quote

07-07-2010 , 03:14 AM

#21

zachvac

Carpal \'Tunnel

Join Date: Apr 2008 Posts: 13,775

Quote:

Originally Posted by funkyj

So it follows that it is pretty awful to judge the skill level of two players by money won unless the sample size is really really big right?

edit: and I agree with you allin-ev adjusted stdev would be interesting to look at (at least on a mathematical level) and obviously we can use stdev to come up with a range for any confidence interval.

Quote

07-07-2010 , 03:37 AM

#22

phebous

adept

Join Date: Apr 2009 Posts: 794

Great Post OP!

From my perspective, I look in the mirror every morning and say "God, I am lucky to be alive! I am lucky to have a wonderful family! ", therefore, I feel I am lucky! I guess this dribbles onto the poker table because 2 times out of 100, I suck out the one outer to take down a large pot! And I feel lucky!

The bottom line is this is a game of partial information, since not all hands are reveled. I do not care how you want to evaluate your luck, it will always be lacking because of partial information. AIEV, Slansky Bucks, G-Bucks, they all are not an accurate impression of luck. You never know exactly what your opponent has until it is shown at the river.

If you are looking for luck, just look in the mirror, life or whatever and find your luck there. You are never going to find it at the poker table. This game is purely based on math and the odds and probability of having the best hand.

Quote

07-07-2010 , 03:44 AM

#23

funkyj

Carpal \'Tunnel

Join Date: Jun 2008 Posts: 6,416

Quote:

Originally Posted by zachvac

So it follows that it is pretty awful to judge the skill level of two players by money won unless the sample size is really really big right?

Yes. uDevil's Poker Results Calculator is handy for toying with these numbers.

CAVEAT: if the skill levels are hugely different, e.g. one players true WR (mythical beast, I know) is 0.5bb/100 and the other's is 7bb/100 then the better player will player will nearly always post a better graph.

Quote

07-07-2010 , 06:00 AM

#24

DDAWD

Carpal \'Tunnel

Join Date: Aug 2009 Posts: 6,879

Well, the reason people like AIEV is that while it is not the only measure of luck, there is absolutely no skill component to it. It's all in the RNG. With the other things, it can be hard to tease out the skill element.

One example is a player who complains about constantly running into tops of ranges for villains. He might be running into tops of ranges, or he might just suck at ranging and assigns some ridiculously wide ranges when they aren't warranted. Or the people who can't fold KK preflop or are way too aggressive with TPTK. That stuff all has elements of skill in it. Not so with AIEV. Yeah, there's skill in all the decisions that led up to the all-in, but after that, it's all the RNG. That's why I like to use it as a measure of luck. Yeah, it's not the only measure of luck, but it is the only unadulterated measure of luck.

Quote

07-07-2010 , 08:26 AM

#25

Cangurino

Carpal \'Tunnel

Join Date: Apr 2008 Posts: 13,476

Quote:

Originally Posted by funkyj

As defined, luck-a and luck-b are mutually exclusive i.e. independent statistical events.

First of all, luck-a and luck-b are not events, but statistics. Furthermore events that are mutually exclusive are never independent (except for trivial anomalities like events with probability 0). Most importantly however I seriously doubt that luck-a and luck-b as random variables are independent. If you catch better hands as per luck-b you're bound to get it in more often which influences luck-a.

All in all it looks interesting, but a lot of the things you stated are rather murky.

Last edited by Cangurino; 07-07-2010 at 08:44 AM.

Quote

Page 1 of 7

First

1 2 3 4 5 6

Last

Post Reply Subscribe

...

Page 1 of 7

First

1 2 3 4 5 6

Last