Probability distribution of patterns and entropy calculation
jameelhassan opened this issue · comments
As per the source code below, to calculate the pattern distribution, it uses possible_words to generate the distribution. Instead, shouldn't we use allowed_words for this, since any of the allowed words can be generating a given pattern.
ie: the function call should be get_pattern_matrix(allowed_words, allowed_words) ??
def get_pattern_distributions(allowed_words, possible_words, weights):
"""
For each possible guess in allowed_words, this finds the probability
distribution across all of the 3^5 wordle patterns you could see, assuming
the possible answers are in possible_words with associated probabilities
in weights.
It considers the pattern hash grid between the two lists of words, and uses
that to bucket together words from possible_words which would produce
the same pattern, adding together their corresponding probabilities.
"""
pattern_matrix = get_pattern_matrix(allowed_words, possible_words)
Got this sorted. The codebase as it is, is correct
Lets say I choose the word CRANE.
In order to get all greys, it means that the ANSWER shud not have any of the letters in “crane”. So the probability of getting all greys is the number of ANSWER WORDS without any of the letters in “crane” divided by the total number of words in the answer list.