Open Side Menu Go to the Top
Register
Correlation Correlation

09-22-2009 , 06:15 AM
Hello,

I was observing some stats and end up in a question:

Is there a correlation between win rate &
(a) Looseness
(b) Aggression
(c) Showndownrate

Of course this seems to be depend on opponent but overall there could be correlation.

pokertableratings.com is giving some charts to this topic.

(a) winrate & looseness


(b) winrate & aggression


(c) winrate & showndownrate
09-22-2009 , 03:17 PM
I have also observed some stats.

http://blogs.twoplustwo.com/leader/2...s-on-win-rate/
09-22-2009 , 06:58 PM
Quote:
Originally Posted by Leader
haven't read it yet but can't wait to criticize it, just out of spite
09-22-2009 , 08:58 PM
Quote:
Originally Posted by GrizzlyMare
haven't read it yet but can't wait to criticize it, just out of spite
hahha some of you guys are ruthless
09-22-2009 , 09:15 PM
Quote:
Originally Posted by GrizzlyMare
haven't read it yet but can't wait to criticize it, just out of spite
ok, originally i was semi-joking and ready to admit that you did good work, but after reading through your analysis i was relieved to learn that i wouldn't have to do that.

yes, it all sounds very smart, with lots of impressive sounding terminology.

however, there are two huge problems with all that cool analysis:

1. for all your stats, you find that the looser/ more aggressive they are the higher the winrate (on average). however, this could simply be the result of running well, as you yourself admit in relation to AF's. if a person is running well he will on average have higher VPIP, PFR, AF's, steal %, and lower FBBts.

you have shown no work to rule out that _any_ of the positive linear terms in your model are simply the result of this effect, or a large portion of each of them is. that essentially makes your linear terms (and hence your conclusions that the laggier the better) useless, regardless of their statistical significance.

2. your cool models have coefficients computed to 3 significant digits, yet as you obviously realize the errors in estimating them are huge, basically the same order of magnitude that the coefficients themselves. yet you haven't provided any indication of these errors, you haven't even consistently provided the stat. significance for those terms (except for the quadratic term in your best quadratic model and maybe a 1 or 2 more). without knowing the errors all these cool numbers are useless, regardless of how many "significant" digits you present.

basically, with all the terminology, plots, tables there's really not much that can be extracted for all this. in fact, it would be more useful if you simply did this instead:

- take any of your stats, say VPIP, then divide all your 147 players into 3 to 5 buckets: those with VPIP between 37 and 34, 34 and 31 etc.

- then report the winrate data (along with estimate errors!), something like this:

VPIP interval total N of hands winrate 2sigma interval
34 to 37 421907 0.1BB/100 -0.9 to 1.1BB/100
...

(obviously all the numbers are made up)

put this on a bar plot with 3 to 5 bars showing winrates in each bucket and 1 and 2 sigma intervals and people can compare the sizes of the variations between buckets with the errors in estimating winrates due to sample size.

- without the analysis of how running well affects VPIP and other stats, not much can be gleaned from either your complicated analysis or my simple plot. the way to take into account this effect is to break up each player's total hands into chunks of hands around 100 to 500 in size. we can assume that his theoretical VPIP, winrate, and other stats in these chunks is the same, so the actual variations are due to good or bad cards that he's getting. you can then estimate a correlation between, say, VPIP, and winrate. it's reasonable to assume that it's roughly the same for all players: something of the form (adding 1% of VPIP due to getting good cards adds between x1 and x2 amount of winrate). only after you estimate this correlation you can then attempt to compensate for the horrible "backward causation" problem.

cliff notes sounds cool and smart, but actually plagued with systematic errors and less useful than a simple plot.

P.S. you should stop saying "effects" when you really mean "affects".
09-23-2009 , 12:10 AM
Quote:
Originally Posted by GrizzlyMare
ok, originally i was semi-joking and ready to admit that you did good work, but after reading through your analysis i was relieved to learn that i wouldn't have to do that.
Good to see you have yet another opportunity to prove your outstanding character and knowledge. Shall we begin?

Quote:
yes, it all sounds very smart, with lots of impressive sounding terminology.
This is a typical shot from non-intellectual types. "You use big words to sound impressive." No I don't. I use the correct words because I handed it in for a grade and would rather not fail.

Quote:
however, there are two huge problems with all that cool analysis:

1. for all your stats, you find that the looser/ more aggressive they are the higher the winrate (on average). however, this could simply be the result of running well, as you yourself admit in relation to AF's. if a person is running well he will on average have higher VPIP, PFR, AF's, steal %, and lower FBBts.
When you admit something is a problem, that's not a flaw in your analysis. It's just a fact of reality.

Quote:
you have shown no work to rule out that _any_ of the positive linear terms in your model are simply the result of this effect,
If the analysis doesn't include something, you make it sound like it's an intentional effort to deceive you. Only the most incomplete half-assed reading of the paper would cause one to conclude that I'm make ground breaking outlandish claims in it. I'm sorry, but it's not my fault you have over read the paper to mean things it never says.

Quote:
or a large portion of each of them is. that essentially makes your linear terms (and hence your conclusions that the laggier the better) useless, regardless of their statistical significance.
I can't prove you wrong, but I disagree none the less.

Quote:
2. your cool models have coefficients computed to 3 significant digits,
lol wtf? Is this like 8th grade science lab? I pulled the number from the table. I choose to pull 3 digits. I could have pulled 100. Significant digits are irrelevant. It's about the p-value/std error. See this is why I choose to use the big words and not shot random language from the hip. I'd be especially careful when expounding about a subject in which I have little to no actual understanding.

Quote:
yet as you obviously realize the errors in estimating them are huge,
No, some are significant some are not. The ones that are not are noted.

Quote:
basically the same order of magnitude that the coefficients themselves. yet you haven't provided any indication of these errors,
bull****

"It is important to note that in all cases the b1 coefficients imply that a looser more aggressive style is more profitable. This inevitably leads to the conclusion that players which are looser or more aggressive than those examined in this analysis (esp. those with PFR greater then 26%) should be studied to determine if such styles are optimal. We should be cautious drawing conclusions based too heavily on the coefficients of TAq, VPiP, WtSD, and FBBtS, however, as all of these have non-significant p-values at the .05 significance level. (Note that p-values result from a test of the hypothesis that the coefficient in question equals zero. If the coefficient equals zero, this implies that there is not a linear relationship between the predictor and the response)"

Quote:
you haven't even consistently provided the stat. significance for those terms (except for the quadratic term in your best quadratic model and maybe a 1 or 2 more). without knowing the errors all these cool numbers are useless, regardless of how many "significant" digits you present.
lol you don't even know how to calculate a p-value. The p-value is directly related to the size of the b-coefficient and the std error.

Quote:
basically, with all the terminology, plots, tables there's really not much that can be extracted for all this.
That's the unfortunate part about this data. The response has high variance and so it's hard to make inference about the mean response. This is also not a flaw in the analysis btw.

Quote:
in fact, it would be more useful if you simply did this instead:

- take any of your stats, say VPIP, then divide all your 147 players into 3 to 5 buckets: those with VPIP between 37 and 34, 34 and 31 etc.
Did that like 4 years ago. Was informative. This analysis seeks to increase my understanding of those relationships.

Quote:
- then report the winrate data (along with estimate errors!), something like this:

VPIP interval total N of hands winrate 2sigma interval
34 to 37 421907 0.1BB/100 -0.9 to 1.1BB/100
...

(obviously all the numbers are made up)
You forgot the estimate of the std error...Yeah I know you can reverse engineer from the CI, but you can do the same from a p-value.

Quote:
put this on a bar plot with 3 to 5 bars showing winrates in each bucket and 1 and 2 sigma intervals and people can compare the sizes of the variations between buckets with the errors in estimating winrates due to sample size.
Looks like you have a plan. Look forward to seeing it. Unfortunately, I was restricted from doing "make a plot" as a project for my grad class.

Quote:
- without the analysis of how running well affects VPIP and other stats, not much can be gleaned from either your complicated analysis or my simple plot. the way to take into account this effect is to break up each player's total hands into chunks of hands around 100 to 500 in size. we can assume that his theoretical VPIP, winrate, and other stats in these chunks is the same,
no

Quote:
so the actual variations are due to good or bad cards that he's getting. you can then estimate a correlation between, say, VPIP, and winrate. it's reasonable to assume that it's roughly the same for all players: something of the form (adding 1% of VPIP due to getting good cards adds between x1 and x2 amount of winrate). only after you estimate this correlation you can then attempt to compensate for the horrible "backward causation" problem.
You sound like you might do some work with math. Like many people in the work-with-math world, you seem to think that because you work with numbers this makes you an expert in applied math or stat when in fact you haven't even mastered the most basic, fundamental concepts like significance, inference, or p-values, which are taught in undergrad stat. I encourage you to take a grad stat course like many of your peers have done over the years. You will quickly see, like they did, that your skills are sadly deficient and that your previous math "training" was a joke.

Quote:
cliff notes sounds cool and smart, but actually plagued with systematic errors and less useful than a simple plot.
Learning isn't just about the material and the teacher. It's also about the person trying to learn.

Quote:
P.S. you should stop saying "effects" when you really mean "affects".
What would one of your posts be without a cheap shot? Want to bet I never find out?
09-23-2009 , 12:31 AM
ego's
09-23-2009 , 02:10 AM
ok, i have something to say about each of your responses, but i'm sick of typing so i'll try to be brief.

as a paper for your stat class your work is probably fine (depending on your class obviously). i criticized your work from the perspective of whether your approach is a good one to answer the question that you are essentially posing: "which stats are best?

now, you admit that point 1 (about the effect of running well) is indeed a problem of your approach (though for some reason you say it's not a "flaw in your analysis"), so let's go directly to point 2.

you accuse me of being a dummy when it comes to statistics, e.g.

"lol you don't even know how to calculate a p-value. The p-value is directly related to the size of the b-coefficient and the std error. "

i may indeed be a stat dummy (i never claimed to be an expert on statistics, but i do claim that i can point out the problems in your analysis), but if i'm not mistaken, in our case the p-value is the probability that if the b-coefficient in question were 0 the data would lead us to the value equal to greater than what we obtained from our actual data. knowing the p-value we can estimate some kind of a confidence interval (i'm guessing, maybe there's a problem with the fact that we can't assume a normal distribution - correct me if i'm saying something wrong). in any case, i do understand that there's a relation between p-values and what i claimed was lacking in your paper, i.e. estimate of uncertainties for your coefficients.

but unless i missed something completely, you haven't provided any p-values. all you provide are some (not all) of the b-coefficients and the R-squared (or adjusted R-squared). i'm willing to accept the possibility that i'm missing something, but please explain how estimates of uncertainities can be obtained from your paper.

finally, as for my suggestion how to start taking into account the "running well effect" (the two quotes right before and right after you say "no"), i do understand that there some additional factors that are hard to take into account. for example, in one session a player may be playing with really foldy opponents so he may increase his theoretical VPIP for that session. i'm assuming it's because of effects like this that you say "no". however, if the chunks are large enough that they span multiple sessions and are comparable to the total number of hands for that player than this effect will be minimized
(and if it were not minimized even for chunks comparable to the total # hands then the same "no" would apply to most of what you've done). perhaps if we only take a few players in the database with a really large number of hands, yet we know that they haven't changed their style during that period (for instance, you can use your own hands) then we can do a better estimate on the correlation of stats with winrate due to the "running well" effect.

anyway, what i've suggested is a reasonable way to approach the problem, and this problem clearly needs approaching, otherwise you cannot say anything about the linear terms or (more importantly) where the optimal values for any of the stats lie.

finally, as for
"What would one of your posts be without a cheap shot? Want to bet I never find out?"

correcting your annoying word misuse is no more a cheap shot than any other criticism of your paper. and your second sentence i just don't get at all.

ok, i'm tired, so i tried to ignore all the cheap shots in your responses and tried to mostly stick to the substance. to recap, my points 1 and 2 still stand (unless you elaborate on point 2).
09-23-2009 , 04:11 AM
just both ship your sn's and we will compare on TR, if you are scared to then the other guy is clearly better and you should no longer talk
09-23-2009 , 10:04 AM
Quote:
Originally Posted by johnnyrocket
just both ship your sn's and we will compare on TR, if you are scared to then the other guy is clearly better and you should no longer talk
what would that be prove of?
09-23-2009 , 11:49 AM
please move this thread to the intellectual mumbo-jumbo forum
09-23-2009 , 01:52 PM
Quote:
Originally Posted by copoka
what would that be prove of?
who's ego is deservingly bigger and which one has the rights to deface the other
09-23-2009 , 02:06 PM
That's exactly what I thought.
Just could not find any ego measuring stats in TR to establish defacing rights.
09-23-2009 , 04:59 PM
These guys are definitely better at the maths than me
09-23-2009 , 05:34 PM
This thread is so full of


Spoiler:
Virgins
09-23-2009 , 06:08 PM
Quote:
Originally Posted by korrupt106
This thread is so full of


Spoiler:
Virgins
09-24-2009 , 01:37 AM
Quote:
Originally Posted by korrupt106
This thread is so full of


Spoiler:
Virgins
haha, you bring your funny self out to thailand for a few weeks
09-24-2009 , 02:16 AM
wtf you doing in thailand
09-24-2009 , 02:19 AM
Not going to Thailand.....Thailand is full of

Spoiler:
you
09-24-2009 , 02:20 AM
zing

<3
09-24-2009 , 04:04 AM
Although I do not agree with everything what Grizzly said, I think his critism is mostly spot on. First of the basic conclusion that can be drawn from the presented analysis is that there is a linear/second order correlation between winrate and the observed variables, no more, no less. By no means anything can be said about the causal influence of the considered variables from the data alone.

Now the AF might arguably be a reasonably explanation but as Grizz pointed out, the analysis is probably flawed because of the great uncertainty in the response variable, the winrate. As a side note: the method suggested by Grizz
Quote:
Originally Posted by GrizzlyMare
[...] the way to take into account this effect is to break up each player's total hands into chunks of hands around 100 to 500 in size. we can assume that his theoretical VPIP, winrate, and other stats in these chunks is the same, so the actual variations are due to good or bad cards that he's getting. you can then estimate a correlation between, say, VPIP, and winrate. [...]
will not solve this problem but rather increase the variance of our estimator.

The simplest solution to decrease variance would be to just increase the sample size. A second approach would be to derive an alternative estimator for the win rate with less variance by using additional information of the game. This actually has been already done by the UoA Computer Poker Research Group and is implemented in a tool called DIVAT (see http://poker.cs.ualberta.ca/papers/kan.msc.html for the paper). I therefore suggest to use this estimate for your analysis.

Good luck!
09-24-2009 , 05:15 AM
Quote:
Originally Posted by korrupt106
wtf you doing in thailand
playing poker with yourface and wolfram and goin out on the town which is like vegas on steroids, pretty sick imo
09-24-2009 , 01:22 PM
Quote:
Originally Posted by soLidas
Although I do not agree with everything what Grizzly said, I think his critism is mostly spot on. First of the basic conclusion that can be drawn from the presented analysis is that there is a linear/second order correlation between winrate and the observed variables, no more, no less. By no means anything can be said about the causal influence of the considered variables from the data alone.

Now the AF might arguably be a reasonably explanation but as Grizz pointed out, the analysis is probably flawed because of the great uncertainty in the response variable, the winrate. As a side note: the method suggested by Grizz
will not solve this problem but rather increase the variance of our estimator.

The simplest solution to decrease variance would be to just increase the sample size. A second approach would be to derive an alternative estimator for the win rate with less variance by using additional information of the game. This actually has been already done by the UoA Computer Poker Research Group and is implemented in a tool called DIVAT (see http://poker.cs.ualberta.ca/papers/kan.msc.html for the paper). I therefore suggest to use this estimate for your analysis.

Good luck!
to be honest, i don't think you fully understood my criticism. i'm not complaining about the sample size that he used (you work with what you've got) or about the variables he selected (it's the principle that matters, not the specific details). the question is also not how to minimize the uncertainty in winrate estimates (so DIVAT has nothing to do with this).

the flaws that i complained about was (a) that he made a somewhat involved and detailed analysis of the data, but because he made no attempts to account for the obvious systematic error of "the running well effect" all the coefficients that he computed are pretty much useless, so the same amount of time and effort would be much better spent taking into account the systematic errors (b) he didn't provide uncertainties for the numerous coefficients that he computed, even though they are typically on the order of the coefficients themselves.

also, i don't see how the method i suggested is wrong for taking into account the "running well effect".
09-24-2009 , 02:57 PM
His criterion for the selection of his population was
21% ≤ VPiP < 37%
16% ≤ PFR < 26%
so I think there is at best a tiny but neglectible (systematic) selection bias with respect to average winrate. That is the expected average winrate of this sampling sheme should be reasonably close to the true one.
On the other hand, his filter resulted in only 147 players with numbers of hands as low as 5k, which as I said before means a lot of variance leading probably to a (random) sampling bias much greater than the systematic error.

There is no point in your "chunk-method" with respect to reducing variance of the estimate of the predictor VPIP (or the model), because the variance just depends on our estimates of the true winrate of a player and his true VPIP. Your methods just implies a different model, i.e. you are trying to model an additional time effect for VPIP, which as I said will ultimately increase the variance of the model.
09-24-2009 , 03:06 PM
Quote:
Originally Posted by soLidas
His criterion for the selection of his population was
21% ≤ VPiP < 37%
16% ≤ PFR < 26%
so I think there is at best a tiny but neglectible (systematic) selection bias with respect to average winrate. That is the expected average winrate of this sampling sheme should be reasonably close to the true one.
that tells me you didn't understand the "running well effect" .

      
m