CoTW: Why all-in-EV is a horrible measure of overall luck
Yes. uDevil's Poker Results Calculator is handy for toying with these numbers.
http://www.evplusplus.com/poker_tool...nce_simulator/
http://www.pokervariancesimulator.fr/
First of all, luck-a and luck-b are not events, but statistics. Furthermore events that are mutually exclusive are never independent (except for trivial anomalities like events with probability 0). Most importantly however I seriously doubt that luck-a and luck-b as random variables are independent. If you catch better hands as per luck-b you're bound to get it in more often which influences luck-a.
Lets take my 107k hands mentioned in the OP as an example. In that sample I had 389 AIEV situations and 13733 VPIPs for non-AIEV and let us assume that with the same luck-a and luck-b that my next 107k hands would break down the same way (389 AIEV, 13733 non-AIEV VPIPs)
- changing luck-b clearly can change how many AIEVs I get in the sample (e.g. perhaps I get 420 AIEVs next 107k hands instead of 389)
- changing luck-b does not affect whether I run above or below EV for each all-in. I.e. the fact that when I have AA vs KK, the KK villain is a drooler more often (improved luck-b) does not change my equity when we get it AI preflop nor does it affect whether I run hot or cold in this situation.
- running hot or cold for AIEV does not affect whether droolers or nits wake up with medium, 2nd or 3rd best hands when I have a strong hand.
Luck-a and luck-b are mutually exlusive in the sense that luck-b is defined as "all the luck other than luck-a". I stick by my (implicit?) assertion that no part of luck-b changes the probability that my AA vs your KK preflop will win 81% of the time. I.e. that luck-a and luck-b are independent.
All in all it looks interesting, but a lot of the things you stated are rather murky.
EDIT: I see that my definition of luck-a is too tersely worded. luck-a is defined as how far above or below EV your run in AIEV situations. I.e. luck-a is what you see when you look at an AIEV graph vs your actual results.
No it's not and it's a super-common misconception among a lot of people. If you think that money won is a better indication than the all-in ev line is of expected winnings you have a super flawed understanding of what luck means and what all-in ev is. I have since read the entire post and it doesn't refute my point at all. Money won uses 0% of luck. Even if all-in ev incorporates 1% of luck incorporating that 1% will still be more accurate than only using the 0% of money won.
This of course all is ignoring card removal which may be a flaw in all-in ev.
edit: btw not criticizing the OP in any way and I would assume that OP agrees with me? I agree all-in ev is a small amount of luck but it's the only fully quantifiable luck and quantifying 1% > 0%
This of course all is ignoring card removal which may be a flaw in all-in ev.
edit: btw not criticizing the OP in any way and I would assume that OP agrees with me? I agree all-in ev is a small amount of luck but it's the only fully quantifiable luck and quantifying 1% > 0%
The second thing I would say is that the OP is correct, but is stating a very limited thesis. All in EV is a horrible measure of all luck in poker. But it doesn't purport to be anything like that. It only purports to be a measure of a specific type of luck, and its somewhat flawed methodolgy of measuring that specific type of luck makes it less reliable than perfect knowledge would be, but does not mean that it is useless or counterproductive to use.
The third thing I would say is that the OP is correct that there are a ton of ways to run bad. I just finished a database analysis for a guy whose win rate with AA was 1/3 below average, and, after detailed analysis, it turned out that the reason for this was that he was running bad in the frequency his opponents had a hand they could call him with.
Problem 2:
Given luck-a and luck-b definitions above ...
If you could run -1/2 standard deviation below EV for one of the types of luck and +1 standard deviation above EV for the other type of luck for the rest of your life, which would you chose to run hot in, luck-a or luck-b?
Given luck-a and luck-b definitions above ...
If you could run -1/2 standard deviation below EV for one of the types of luck and +1 standard deviation above EV for the other type of luck for the rest of your life, which would you chose to run hot in, luck-a or luck-b?
Originally Posted by Funkyj's original post
The problem with all-in-EV is not that it is a bad statistic but that many folks treat all-in-EV as if it is a measure of their overal poker luck
Originally Posted by mpethy
... , but does not mean that it is useless or counterproductive to use.
I think that comparing EV adjusted stddev/100 and regular stddev/100 might provide some insight into how big a factor AIEV variance is in a player's overall results. I need to learn how to calculate this new (?) stat.
Until we have a better idea what fraction of overall luck AIEV is factoring out we don't know how much clearer controlling for AIEV luck is making the overall picture.
If mere results were the best measure of skill then we would have to agree that Hellmuth is the world's best NLHE tournament player ever by a huge margin...
By all means, prefer your AIEV line over your actual winnings line. I stick my my claim that comparing results graphs (actual or AIEV adjusted) to determine who is the better player for all but very large samples is a bad idea.
Now days I use my EV graph as a psychological tool to:
- avoid overconfidence (sorry buddy, you are just running hot)
- patience (yes, you have been running as bad in AIEV as you think you have)
- motivate study (ugh, the graph of your AIEV only hands is horrible -- you need to stop stacking off bad)
Using AIEV for anything more than a tool to direct further investigation is flawed.
A while back in an epeen coaching thread where some high stakes (?) player was saying mpethy was not qualified to coach 200NL players for leak finder sessions he (mpethy) explained that seeing certain combinations of stats was not a guarantee that a leak was present. Instead it is a signpost that he should review hand histories from a particular type of situation to see if there was truly a leak of if the unusual stats were the result of this other luck (all the area outside of the well lit AIEV area). This is what I'm saying about results graphs (actual and AIEV) -- it doesn't tell us much by itself. It is merely a signpost that we may want to look into somethings.
This is interesting because the right answer depends a lot on playing style. If you are 20bb shortstacking, then you'll be in a lot more AI situations with a few cards to come, so being 1 std dev up on your chances there would be fantastic. It would certainly outweigh the benefit of opponents having strong hands when you have very strong hands, like KK v AA, because you can only win 20bb when those hands do show up.
If you are a shortstacker and you shove and get called 10 times do you want those 10 calls to be from droolers and otherwise competent full stack players who don't know how to play effectively against short stacks or do you want those 10 calls to be from folks who play perfectly in the BTN, SB, BB against your HJ shove?
luck-b greatly impacts what winrate the short stacker's EV adjusted line shows.
The one thing you do know about all in ev is that it tracks your luck in a spot that is, on average, a way bigger pot than most situations. So, while me may not be able to quantify what percentage of luck all in ev represents, we can describe it qualitatively as a pretty damn big deal.
You calling this a bad idea leads me back to the point I seem to make a lot on these forums, which is that I think people have unreasonably high standards of proof with respect to poker. It is inherently a game of incomplete information, yet when people talk about win rates and how quickly HUD stats converge and whatnot, now, all of a sudden, anything less than a 95% confidence interval is dismissed as unreliable. It makes me smile, is all.
If you are going through life expecting to make your decisions at the 95th% confidence interval, you are going to have a hard life (and lose at poker along the way, too).
Now days I use my EV graph as a psychological tool to:
Using AIEV for anything more than a tool to direct further investigation is flawed.
- avoid overconfidence (sorry buddy, you are just running hot)
- patience (yes, you have been running as bad in AIEV as you think you have)
- motivate study (ugh, the graph of your AIEV only hands is horrible -- you need to stop stacking off bad)
Using AIEV for anything more than a tool to direct further investigation is flawed.
Basically, the ev adjusted win rate filters out a specific type of noise. It is always useful to filter out that type of noise, even if it leaves other types of noise behind.
The only bad use of ev adjusted win rates that people engage in is using it as an excuse to not work on their game--"My EV adjusted win rate is 2ptbb/100, therefore, I am beating the game even though I am b/e, therefore I don't need to work on my game." That thinking is flawed because their win rate is close enough to b/e that maybe they are running hot enough in other situations to give them a 2ptbb win rate when, in fact, they are a loser in the game. and it is flawed simply because it is lazy.
Plenty of people use all in ev in just this way. It's a leak that needs plugging in their game.
A while back in an epeen coaching thread where some high stakes (?) player was saying mpethy was not qualified to coach 200NL players for leak finder sessions he (mpethy) explained that seeing certain combinations of stats was not a guarantee that a leak was present. Instead it is a signpost that he should review hand histories from a particular type of situation to see if there was truly a leak of if the unusual stats were the result of this other luck (all the area outside of the well lit AIEV area). This is what I'm saying about results graphs (actual and AIEV) -- it doesn't tell us much by itself. It is merely a signpost that we may want to look into somethings.
But, yeah, what I said in that thread is true. When I use all in ev, I use it like this. OK, your win rate with AA is x; that is 1/3 below normal for a solid winning reg. OK, let's check your ev adjusted winrate. OK, it is higher than your raw win rate, so we know you are running bad, at least to an extent. Now let's check to see if you got it in good in those spots. Then check other spots, etc. etc.
So checking all in ev is an important step in almost every analysis that I do. But that is all it is--a step. It is very rarely the be all and end all of an analysis.
But, by the same token, sometimes it is. If I see a guy whose win rate is 1/3 below average for winning regs, but his ev adjusted win rate is right at average, that's it. I am saying game over, you are just running bad with AA, and I doubt you have a significant leak there. Usually, though, the answer is not that clear cut.
If you are wondering how I can be this precise, it is because almost everybody up to NL $100 has all 3 of the three biggest leaks, and almost everybody playing $100 has the 4th biggest leak.
With players playing above $100, spots that can be influenced by all in ev become less of a leak, and I would say that the average NL $200 player may only have one small leak that can be substantially affected by all in ev. Sometimes I see 2, but in these cases they are usually the third and fourth biggest leaks, not the second and fourth biggest leaks as they are for NL $100 and below.
For players playing, say, NL $100 and below, I would say that for 80-90% of them, at least one of their three biggest leaks is a situation that is significantly influenced by all in luck. Usually, I see precisely that their second and fourth biggest leaks are both situations that are heavily influenced by all in ev.
If you are wondering how I can be this precise, it is because almost everybody up to NL $100 has all 3 of the three biggest leaks, and almost everybody playing $100 has the 4th biggest leak.
With players playing above $100, spots that can be influenced by all in ev become less of a leak, and I would say that the average NL $200 player may only have one small leak that can be substantially affected by all in ev. Sometimes I see 2, but in these cases they are usually the third and fourth biggest leaks, not the second and fourth biggest leaks as they are for NL $100 and below.
If you are wondering how I can be this precise, it is because almost everybody up to NL $100 has all 3 of the three biggest leaks, and almost everybody playing $100 has the 4th biggest leak.
With players playing above $100, spots that can be influenced by all in ev become less of a leak, and I would say that the average NL $200 player may only have one small leak that can be substantially affected by all in ev. Sometimes I see 2, but in these cases they are usually the third and fourth biggest leaks, not the second and fourth biggest leaks as they are for NL $100 and below.
Spill the beans/ leaks
good stuff, keeeeep it real
I think that's enough to want to include it in any analysis, but not enough to complain that you can't win because of it.
Much, much more important than spareing more than 5 seconds of thought to your all in EV numbers.
Well the second one. You either flop a set or you dont, its called bernoulli distribution or binomial whatever you prefer.
And are you sure poker winnings (pot sizes) are normally distributed? Have you made some statistical tests? I have and the tests failed. Even Mathematics of Poker assumes that winnings are normally distributed, but I have never seen any proof of this.
Don't just assume that there's only one distribution in the world that describes our life... And if we don't know the distribution, then we can't say much about confidence intervals etc.
I agree with your other point, and I don't think it is likely that poker winnings are normally distributed. There's no reason they should be.
Ty. And this is far more important than the AA/KK point.
WTF??? The first one is discrete distribution and we all know that normal distribution is not discrete.
Well the second one. You either flop a set or you dont, its called bernoulli distribution or binomial whatever you prefer.
And are you sure poker winnings (pot sizes) are normally distributed? Have you made some statistical tests? I have and the tests failed. Even Mathematics of Poker assumes that winnings are normally distributed, but I have never seen any proof of this.
Don't just assume that there's only one distribution in the world that describes our life... And if we don't know the distribution, then we can't say much about confidence intervals etc.
Well the second one. You either flop a set or you dont, its called bernoulli distribution or binomial whatever you prefer.
And are you sure poker winnings (pot sizes) are normally distributed? Have you made some statistical tests? I have and the tests failed. Even Mathematics of Poker assumes that winnings are normally distributed, but I have never seen any proof of this.
Don't just assume that there's only one distribution in the world that describes our life... And if we don't know the distribution, then we can't say much about confidence intervals etc.
E.g.:
You could run 1 standard deviation above all-in-EV for your entire life and 2 standard deviations below expectation in all the forms of luck that EV does not measure and, if you thought all-in-EV was the beginning and end of luck you would think you were running hot but suck at poker.
The main reason why all-in EV is useful is that in theory you can compute it accurately. Some of the other factors are hard to quantify. However, things like getting dealt aces or hitting sets happen much more often than all-in pots, so we would expect them to converge a lot faster. Moreover, getting dealt aces is worth only something like 5ptbb. Winning an all-in pot is worth 100ptbb.
Just as an example, I played 185222 hands this year. I was supposed to see 10895 pocket pairs, or 838 of each rank. In fact I saw 10996 pairs, 888 times 66, and 778 times 55. Statisticians can probably deduce from that if I ran incredibly hot in that aspect, or if it is just a slight deviation from the expected value.
I kind of forgot what my point was, but I'm posting this anyways.
Here is a list of aspects (situations) of no limit hold’em where there is an expected value and normal distribution that is not measured by all-in-EV. The point of this exercise is to see how much bigger (qualitatively, if not quantitatively) the elephant of poker luck is than the “elephant tail” of all-in-EV that folks like to use as their main barometer of luck.
- How often are you dealt AA, KK? There is a normal distribution for this.
- How often do your pocket pair flop sets? Normal distribution.
...
----
i am pretty sure your database will be the best example; analyse your database, how often you got dealt aa/kk vs how often you should have; how often you flopped a set vs how often you should have. you should be quite near the theoretical optimum.
the things you mentioned are influenced by poker luck, but the sample size is WAY WAY bigger than the ones with all-in; so the likelyhood of a deviation is smaller, compared to having a smaller sample size.
furthermore, there is a huge difference in the absolut amount of bb which are influenced by this deviation. getting dealt less aces, influences the winnings by ~0.6bb/ace; winning one all-in less influeces the winnings by ~50bb (average pot AI?).
so yes, while there are other influencing factors to poker luck, ai-ev does have one of the biggest impacts on the overall winnings.
---
that is also the problem with your luck-a/luck-b excercise; of course you would choose the non-aiev luck to your favourite. but it's not about choosing, its about the likelyhood of a deviation therein and its influence on the winnings. this is directly influences by (1) the sample size and (2) the impact on the winnings.
That's a bit nitty. For reasonable sample sizes both events will closely approach a Gaussian distribution ("normal") to the point where the difference can be ignored unless you need very high confidence levels.
I agree with your other point, and I don't think it is likely that poker winnings are normally distributed. There's no reason they should be.
I agree with your other point, and I don't think it is likely that poker winnings are normally distributed. There's no reason they should be.
I'm pretty fuzzy on the details, so if someone could link to this if they know how to find it, that would be good.
http://www.pokergurublog.com/content...-6-max-softest
I couldn't find the original article.
I'm not posting this to hijack to a discussion on 6max vs FR. Just to show the graphs which are at the bottom of the link. Not quite a normal distribution, but close.
I couldn't find the original article.
I'm not posting this to hijack to a discussion on 6max vs FR. Just to show the graphs which are at the bottom of the link. Not quite a normal distribution, but close.
Gets into my theory that anything studied and analyzed will be periodic even if its not in nature.
I wouldn't call it close at all. All three have the expected negative skew to about 65/35 (losers/winners) and a fat tail on the left. And the full ring graph has a lot of extra kurtosis (sharp peak). Heads-up is the closest to normal if we center it on -4bb/100 but it still has a bit of a fat tail on the left. None of these could be called normal distributions.
They look a lot like Gumbel distributions to me.
They look a lot like Gumbel distributions to me.
The second thing I would say is that the OP is correct, but is stating a very limited thesis. All in EV is a horrible measure of all luck in poker. But it doesn't purport to be anything like that. It only purports to be a measure of a specific type of luck, and its somewhat flawed methodolgy of measuring that specific type of luck makes it less reliable than perfect knowledge would be, but does not mean that it is useless or counterproductive to use.
It's just a stat, it is what it is. Sure I look to see if I'm above or below my expected value but I don't consider this one stat by any stretch of the imagination to be an indicator of a person's poker success or abilities. They have nothing to do with one another. I think the EV graph was added so math nerds could have something else to contemplate and analyze. No offense to math nerds.
Hell it's a great topic for discussion but I don't see why this was a topic for a CoTW? I guess it's still a "concept".
Either way, props to the OP for spending the time to formulate his thoughts on the topic with such detail.
Feedback is used for internal purposes. LEARN MORE