fedden / poker_ai

🤖 An Open Source Texas Hold'em AI

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

The rationality of 52-card clustering

pzyqwe opened this issue · comments

There is a problem with the estimated hand size of the deuces package. Because it is not based on the current win rate to generate card size. So it can't judge the situation of Draw hang. There is also no way to distinguish between community cards and hand cards. For example, it would treat the hand as a pair of aces and the community cards as 2, 7, and 8.
It's obviously unreasonable to get into the same category with the hand cards of 2 7 and the community cards of A, A, and 8.

https://github.com/simoncai519/texaspoker

Here we have implemented 52 cards based on the different behaviors of other players and the number of existing players on the table to calculate the dynamic win rate to generate a cluster. It may be more reasonable.

commented

Hello, thanks for using our code base. Can you post an example behavior or a code snippet used to reproduce? I need a little more detail about your point on the draw hang as I am not understanding. We do use a vector of length three for win rate in order to take into account the possibility of draws, if that is what you're referring to (ie; [win, loss, tie].)

WRT considering hole cards as distinctly different than public cards, if I'm understanding correctly, we take this into account by considering distinct hole card + public card groups. Ie; (52 choose 2) * (50 choose 3) as opposed to (52 choose 5)

And, that's not to say that there aren't superior methods than the one we used

from deuces import Card
from deuces import Evaluator
evaluator = Evaluator()
board = [Card.new('8h'),
Card.new('3h'),
Card.new('2s')]
hand = [Card.new('As'),
Card.new('Ah')]
print evaluator.evaluate(board, hand)

board1 = [Card.new('As'),
Card.new('Ah'),
Card.new('2s')]
hand1 = [Card.new('3h'),
Card.new('8h')]
print evaluator.evaluate(board1, hand1)

They both output 3525

If you use another to evaluate
state = State(logger=log, totalPlayer=6, initMoney=10000, bigBlind=100, button=0)
cardset = list(range(0, 52))
state.player[0].cards = [50, 51]
state.sharedcards = [5, 10, 20]
state.currpos = 0
print('my cards')
print(print_card(state.player[0].cards[0]))
print(print_card(state.player[0].cards[1]))
print('shared cards')
print(print_card(state.sharedcards[0]))
print(print_card(state.sharedcards[1]))
print(print_card(state.sharedcards[2]))
simon_ai = ai()
simon_ai.make_decision(state=state)

my cards
diamond, A
club, A
shared cards
heart, 3
diamond, 4
spade, 7
simulate guess book finish --- 0.5524904727935791 seconds ---
estimated win rate, 0.4166666666666667, target, 362.2222222222223

state = State(logger=log, totalPlayer=6, initMoney=10000, bigBlind=100, button=0)
cardset = list(range(0, 52))
state.player[0].cards = [10, 5]
state.sharedcards = [50, 51, 20]
state.currpos = 0
print('my cards')
print(print_card(state.player[0].cards[0]))
print(print_card(state.player[0].cards[1]))
print('shared cards')
print(print_card(state.sharedcards[0]))
print(print_card(state.sharedcards[1]))
print(print_card(state.sharedcards[2]))
simon_ai = ai()
simon_ai.make_decision(state=state)

my cards
diamond, 4
heart, 3
shared cards
diamond, A
club, A
spade, 7
simulate guess book finish --- 0.5086402893066406 seconds ---
estimated win rate, 0.03799999999999999, target, 0

We can see that the same cards are completely mapped to different categories

state.player[0].cards = [50, 51]
state.sharedcards = [5, 10, 20]
state.currpos = 0
state.player[1].totalbet = 1000
state.player[2].totalbet = 2000

my cards
diamond, A
club, A
shared cards
heart, 3
diamond, 4
spade, 7
number of simulations: 1029
estimated win rate, 0.3449999999999999, target, 253.04999999999987

Here is the clustering of the same hand based on the opponent's behavior, and there is a certain gap from the above.

commented

Thanks for providing the code. We don't outright cluster by expected win rate on the flop.

We do on the river, yes, but then we move back to the turn and cluster the turn cards based on their distribution of transitioning into a certain river cluster. And for the flop we cluster based on the their distribution of transitioning into a certain turn cluster.

This allows for situations like you describe to be accounted for.

Let me continue to think about this in case I'm missing something, as my full attention is not on this problem right now. I can try to explain better when I can think about this more clearly.

commented

So in this example:

board = [Card.new('8h'),
Card.new('3h'),
Card.new('2s')]
hand = [Card.new('As'),
Card.new('Ah')]
print evaluator.evaluate(board, hand)

board1 = [Card.new('As'),
Card.new('Ah'),
Card.new('2s')]
hand1 = [Card.new('3h'),
Card.new('8h')]

Most importantly, I don't think these two would wind up in the same cluster based on the reasoning above. The win rate by the river on the board where the AA are shared would be very different than where the AA is not shared.

If they did wind up in the same cluster (don't think this would happen often):
The bot would use the trained probabilities assigned with the cluster of her private cards for the preflop betting round. (We use lossless compression there, so hands in the same cluster are strategically the same without additional information from the board.)

I can't speak for what the bot would learn, but I think it's safe to assume she would learn different strategies for AA and 83s. (We're talking about after counterfactual regret minimization here.)

So, even if your two flop scenarios are in the same cluster (it's possible but won't always happen based on my reasoning above), she will have different probabilities of getting there based on the strategy of her particular hand preflop.

By the flop, she has that new 5 card hand, regardless of what came before it (I believe they call this imperfect recall in the literature but could be wrong).

So the idea is, if the bot happens to get to that stage with either private card + public card combo, she could play the same strategy and that might be ok because her play preflop adjusted the strategy to approximate a Nash equilibrium for that decision point. Now we're at a new decision point and need to make the best possible decision on the flop.

I can see some instances where that might not be the most ideal (like it mattering which specific cards you are hiding), but we're collapsing a space of ~26 million on the flop down to between 20-200 clusters, so I think it's fine and strategically similar enough.