TLDR, unless you are interested in evaluators programming, and are familiar with the coding works of Andrew Prock, Andrzej Nironen, Cactus Kev, Paul Senzee, Ray Wooten, Steve Brecher, mikey1961 and those kind of guys.
BEGIN Cliff Notes
I explain here how I've made a fully working evaluator for pre-flop holdem (enumeration, not Monte Carlo) for exact cards vs exact cards for multiple players, that it's currently performing at my machine even faster than PokerStove.
END Cliff Notes
I will neve make it to a Poo-Bah post by post counting, but I think this could in some way be considered my non-Poo-Bah Poo-Bah post.
Excepting Andrew Prock (and I'm 99.99% sure UofA too), efficient evaluation for the specialist poker programmer has been so far only about programming a lower level abstraction function, that is being called by a higher level function that enumerates the boards (and this one is called by a higher level function that enumerates the hand ranges when the program supports hand ranges).
This single approach explains why people like mikechops, that has put a very nice evaluator like Holdem Ranger, with a couple of new features like weights for ranges and an extended hand range language, have failed misserably to perform efficiently in the same way PokerStove does.
Simply, because their approach is missing the most importants parts of the performance improvements. It misses the optimizations that come from the enumerations (that can be divided in the enumerations of the boards, and the enumerations of the players hands when we are evaluating ranges against ranges). Andrew Prock speaks a bit about this in his post
Agnostic versus Zealous, but is too elyptical and does not explain the approaches that can be made in any way.
I've been programming also my "proof of concept" of holdem pre-flop evaluations for exact hands vs exact hands, and I've achieved results that are way under PokerStove's performance for exact hand vs exact hands (
here is the link to my code).
What is currently considered as "state of the art" evaluation and was considered a huge break thru is RayW's evaluator, that was developed by Ray Wooten following an idea of mykey1961.
But, it only addresses a part of the problem, the evaluation, but when we move to higher level functions like Holdem Pre Flop evaluation, I think it has no use, and I'll try to explain why.
Mainly, the exact hand vs exact hands problems has to exploit boards isomorphisms in order to be efficient. This is what mikechops refers when speaking about PokerStove's performance as "The programmer did some pretty sick pre-flop equity optimizations". It comes from two parts as I said, the boards enumerations that I'll explain in detail (I did wrote my proof of concept for this that exactly evaluates pre-flop equity for any cards, for any number of players until 10), and the hand ranges enumeration that I'll probably explain later (I didn't wrote a proof of concept for this).
So, here starts the explanation of the optimizations for exact hand vs exact hands, considering first only the boards and without considering the players cards.
Let's say the board comes XXXYY, and no player has either X or Y in his pocket hole cards. All of these XXXYY boards are isomorphic. So a non enumeration optimized evaluator will traverse all this boards, as there are 4 Xs on the deck 2 Ys on the deck, it will call 4*6 times the 7 cards evaluator (like RayW). But we can group all this enumerations together, and ask the evaluator who wins (or ties) in this case, but we can update the 24 boards at once.
Now let's say the flop comes XYZTU, and the 4 X, the 4 Y, the 4 Z, the 4 T, and the 4 U are on the deck. So we have 4^5 = 1024 boards of this type. Some of them will be isomorphic and some will not. From these 1024 boards, there are 4 boards that are all of the same suit, it leaves us with 1020 boards. From these 1020 boards, now let's analyze the boards that have 4 suits. These are C(4;1) * C(5;4) * C(3;1) = 60 (4 suits, we choose four cards of the same suit, and can chose another one of a different suit). So this leaves us with 1020 - 60 = 960 boards that remain (plus, the 60 suited boards can also be reduced to 15 isomorphic boards). And now we analyze the 3 suited boards. We have C(4;1)*C(5;3)*C(3;1)*C(3;1)=360 boards (also, some of this boards are isomorphic too). So from the 960 boards, it leaves us with 600 boards. This 600 boards are the rainbow boards. And they are all treated (and enumerated) separately by an evaluator like the one of Steve Brecher's intead of being grouped together by the higher level function and sent to the 7 cards evaluator once.
So the first thing that needs to be noticed when evaluating is that isomorphic boars need to be grouped for faster performance. And in order to group this boards, the higher level function will be perfectly knowing which kinds of boards it will be sending to the low level function (let's say the RayW's function).
It will be sending either boards to be evaluated for flushes (specifying exactly which cards are suited), or boards to be evaluated for ranks only. Like here, I want you to evaluate this JJJTT board for 24 combos for ranks. Or like, I want you to evaluate this KQJT9 board with all suits spades.
So, here the RayW's evaluator (that I call it a LUT Cards) dissapears when we are evaluating with a "more intelligent" upper level function, and it can be replaced by what I call a LUT Ranks, that is a LUT that can be queried either to perform ranks evaluations or suits evaluations. Like RankHandValue7LUTRanks(p,r1,r2,r3,r4,r5,r6,r7) when we are evaluating a board for ranks (adding the players cards), and like SuitHandValue5LUTRanks(p,r1,r2,r3,r4,r5) when we are evaluating a board with 3 suited cards, and a players suited hole cards.
So, the LUT Cards that was 123MB, now gets replaced by a 1.53MB LUT Ranks (the LUT Ranks gets calculated at program startup up, in the same way that RayW's LUT Cards. By the way, I wrote a LUT Cards too, that I'm generating in less than 1 second with pointers inside instead of indexes, but I don't use it cause it does not make any sense, so this LUT Ranks is calculated in a blast).
It also has the advantage that the LUT Ranks can fit in a processors cache, so we will not be experiencing cache's misses.
But in order to do this, we need to discard what I call "the linear deck", that is a deck with all the cards ordered, and no other information. So I substitute this by what I call "a structured deck", that has a lot of information.
So let's say we want to evaluate a JJJTT board. We look from the deck how many Js we have and how many Ts we have, and if Js > 3 and Ts >2, we know that we will have C(J;3) * C(T;2) combos that will be isomorphic, and we group them and send them together to be evaluated for ranks only.
There is also another important point. At boards that have exactly 3, 4 and 5 different ranks, we will be starting with a number of isomorphic boards that are all the combos, including the suits. Then will be adjusting this for flushes. So as we will be adjusting for flushes, it helps to pre-calculate all the suits results (result in term of "hand value of a 7 cards evaluator", and also winners), of all the flushes at this particular holdem table. Like any board with 7s6s4s and not any other spade will be won by player X, with a hand value of HV. So here I introduce the concept of the LUT Suits Evals, that is used to pre-calculate all the winners of flushes for this particular holdem table (the holdem table has the hole cards of all players), so this flushes can be calculated only once.
This is how the "old way" works, in the code of Steve Brecher:
Code:
for (deckIx0 = 0; deckIx0 <= limitIx0; ++deckIx0) {
board[0] = deck[deckIx0];
for (deckIx1 = deckIx0 + 1; deckIx1 <= limitIx1; ++deckIx1) {
board[1] = board[0] | deck[deckIx1];
for (deckIx2 = deckIx1 + 1; deckIx2 <= limitIx2; ++deckIx2) {
board[2] = board[1] | deck[deckIx2];
for (deckIx3 = deckIx2 + 1; deckIx3 <= limitIx3; ++deckIx3) {
board[3] = board[2] | deck[deckIx3];
for (deckIx4 = deckIx3 + 1; deckIx4 <= limitIx4; ++deckIx4) {
board[4] = board[3] | deck[deckIx4];
mPotResults
}
}
}
}
}
This is how the efficient HoldemPreFlop eval function skeleton looks like, without all the code to adjust for flushes, and to call the evaluation functions:
HTML Code:
for (X=RANK_A; X>=RANK_2; X--) {
if (NX>=1) {
for (Y=X-1; Y>=RANK_2; Y--) {
if (NY>=1) {
// Board Quads XXXXY
if (NX==4) {
// lIsomorphicBoards=C(NX,4)*C(NY,1);
// As NX==4 then C(NX,4)==1
lIsomorphicBoards=C(NY,1);
}
// Board Quads YYYYX
if (NY==4) {
// lIsomorphicBoards=C(NX,1)*C(NY,4);
// As NY==4 then C(NY,4)==1
lIsomorphicBoards=C(NX,1);
}
// Board Full House XXXYY
if (NX>=3 && NY>=2) {
lIsomorphicBoards=C(NX,3)*C(NY,2);
}
// Board Full House YYYXX
if (NX>=2 && NY>=3) {
lIsomorphicBoards=C(NX,2)*C(NY,3);
}
for (Z=Y-1; Z>=RANK_2; Z--) {
if (NZ>=1) {
// Board Trips XXXYZ
if (NX>=3) {
lIsomorphicBoards=C(NX,3)*C(NY,1)*C(NZ,1);
}
// Board Trips YYYXZ
if (NY>=3) {
lIsomorphicBoards=C(NX,1)*C(NY,3)*C(NZ,1);
}
// Board Trips ZZZXY
if (NZ>=3) {
lIsomorphicBoards=C(NX,1)*C(NY,1)*C(NZ,3);
}
// Board Two Pair XXYYZ
if (NX>=2 && NY>=2) {
lIsomorphicBoards=C(NX,2)*C(NY,2)*C(NZ,1);
}
// Board Two Pair XXZZY
if (NX>=2 && NZ>=2) {
lIsomorphicBoards=C(NX,2)*C(NY,1)*C(NZ,2);
}
// Board Two Pair YYZZX
if (NY>=2 && NZ>=2) {
lIsomorphicBoards=C(NX,1)*C(NY,2)*C(NZ,2);
}
for (T=Z-1; T>=RANK_2; T--) {
if (NT>=1) {
// Board Two Pair XXYZT
if (NX>=2) {
lIsomorphicBoards=C(NX,2)*C(NY,1)*C(NZ,1)*C(NT,1);
}
// Board Two Pair YYXZT
if (NY>=2) {
lIsomorphicBoards=C(NX,1)*C(NY,2)*C(NZ,1)*C(NT,1);
}
// Board Two Pair ZZXYT
if (NZ>=2) {
lIsomorphicBoards=C(NX,1)*C(NY,1)*C(NZ,2)*C(NT,1);
}
// Board Two Pair TTXYZ
if (NT>=2) {
lIsomorphicBoards=C(NX,1)*C(NY,1)*C(NZ,1)*C(NT,2);
}
for (U=T-1; U>=RANK_2; U--) {
if (NU>=1) {
// Board High Card XYZTU (may also be straight)
lIsomorphicBoards=C(NX,1)*C(NY,1)*C(NZ,1)*C(NT,1)*C(NU,1);
}
}
}
}
}
}
}
}
}
}
There are also a lot of minor performance improvements that can be made and I'm making, and they might sound strange but they help to improve performance, like sorting the players by suits so we have all the suited players together and traverse them faster, remapping suits of cards to the first suits (like AcAd vs. 6s5s gets mapped to AdAh vs 6c5c) so we can avoid to traverse a suit when this suit has no player with cards of this suit, like not calculating the partial pots during the calculation and doing it at the end by having the details of how many times each players ties2, ties3, ties4 or tiesAll, like having the player with the biggest pair at player 1 so we avoid to update player 1 wins and complete them at the end of the evaluation, and a zillion minor performance improvements.
BTW, there is another improvement I can add regarding the flushes calculation, it should improve them considerably (in the same way that with the specialized functions for two, three and four players), may be I'll put add it there soon.
OK, I think this is enough for a start, so if anyone has interest I can expand on a particular point.