Quote:
Originally Posted by statmanhal
Here is one way. Get the standard deviation per hour for each data point. Then use a weighted formula where the weights are the number of hours played.
For example
Win amount Number of hours Win Rate/Hr
120 .....................6 .................... 20
50 .......................2 .................. 25
You have here the equivalent of 8 observations, 6 with a win rate of 20 and 2 with a win rate of 25. Suppose the data were in Columns A, B and C of Excel. The formula for the weighted standard deviation is
=SQRT(SUMPRODUCT((C1:C2-C3)^2,B1:B2)/(SUM(B1:B2)-1))
where C3 = SUMPRODUCT(C1:C2,B1:B2)/SUM(B1:B2), the weighted average.
For this example, the answer is 2.31455.
This is not the proper way to compute standard deviation for sessions of varying duration, and this cannot be done with the standard deviation formula from Excel. The correct formula which gives the maximum-likelihood estimate of the variance σ
2 is:
where
X
i is the amount won in the ith session (dollars or bb)
T
i is the duration of the ith session (hours or hands)
µ is the win rate per unit time ($/hr, bb/hand, etc.)
N is the number of sessions
SD is standard deviation
I've included a derivation of this formula below which is essentially the same derivation that appears in the appendix of Mason's
Gambling Theory and Other Topics.
Note that the number of terms is equal to the number of sessions played which is the N that we divide by out front, and each term is divided by the duration of each session. The expected result of each session, which is subtracted in each term, is the hourly rate times the duration of the session. Note that the win rate µ is computed over all sessions, not for each session.
Note that by your method, if a player has a session 5 hours long in which he wins $500, you would enter this as 5 hours in which he won $100 each hour. This will give a different result as it implies a greater consistency than we can actually assume.
I can provide an Excel spreadsheet which performs this calculation correctly to anyone interested.
---------
This is the derivation of the maximum likelihood estimator for the variance for sessions of variable length. The derivation is exactly the same as the textbook derivation for sessions of equal length, except that the variance is multiplied by the session length T
i, and the standard deviation is multiplied by
. Here is the derivation:
Let X be a vector of session results, and T
i be the duration of the ith session. Each session result X
i is a random variable distributed as a normal distribution of mean U
i = µT
i, and unknown variance T
iσ
2, where µ and σ
2 are the mean and variance for 1 unit of time or number of hands (e.g. 100 hands). The probability distribution of a given observation X
i given σ is:
This is simply the definition of the normal distribution where the standard deviation has been replaced by
, and the variance has been replaced by T
iσ
2. The conditional probability of a vector of N observations X given σ, called the likelihood function, is obtained by multiplying N of these together, which causes a sum to appear in the exponential, and a product of
out front.
To find the value of σ
2 which maximizes the likelihood function, it is convenient to take the log of the likelihood function and maximize that. The logs of products become sums.
Taking the derivative of this with respect to σ
2 and setting = 0:
Note the similarity of this result to the standard definition of variance for sessions of equal duration. The only differences are that each term inside the sum is divided by the session duration T
i, and the constant mean µ has been replaced with U
i which depends on the duration of each session. If the sessions are of equal length, T
i becomes a constant T which can be removed from the sum, and the sum would be divided by NT which is the total number of hours in N sessions.
To put this in the form found in Mason’s essay, expand the square, and break this into 3 sums:
Since U
i = µT
i,
Now since
is the sum of the session results, this is the same as the hourly rate µ times the total hours, or
, so the second term is
. This can be combined with the final term to give Mason’s form:
Caution: This form may be highly susceptible to round off error.
Last edited by BruceZ; 02-13-2010 at 03:00 AM.