Are `predict_win` and `predict_draw` functions accidentally using Thurstone-Mosteller specific calculations?
asyncth opened this issue · comments
If I understand it correctly, those two functions seem to perform calculations using equations numbered (65) in the paper. However, those equations seems to be specific to Thurstone-Mosteller model and as far as I can tell, the proper way to calculate probabilities for Bradley-Terry model would be to use equations (48) and (51) (also seen as p_iq in equation (49)). Is this intended? Or am I misunderstanding either the paper or the code of these functions?
The prediction functions are not derived from solely from (65), but rather from it's combination with (72). AFAIK there are no papers or articles that describe how to generalize
But when I tried to apply the modified prediction function (it's easy enough to alter it), it produced virtually the same results.
Example
Using the current generalized formula as implemented, BradleyTerryFull
produces this result in the benchmarks:
Enter Model: BradleyTerryFull
Benchmark Processor: Win
Enter Random Seed: 1
----------------------------------------
Confident Matches: 5661
Predictions Made with OpenSkill's BradleyTerryFull Model:
Correct: 583 | Incorrect: 52
Accuracy: 91.81%
Process Duration: 0.8336913585662842
----------------------------------------
Predictions Made with TrueSkill Model:
Correct: 593 | Incorrect: 42
Accuracy: 93.39%
Process Duration: 2.950780153274536
Mean Matches: 2.3195027353377617
Here is a benchmark with equation (48) implemented into predict_win
:
def predict_win(teams: List[List[Rating]], **options) -> List[Union[int, float]]:
if len(teams) < 2:
raise ValueError(f"Expected at least two teams.")
n = len(teams)
pairwise_probabilities = []
for pairwise_subset in itertools.permutations(teams, 2):
current_team_a_rating = team_rating([pairwise_subset[0]])
current_team_b_rating = team_rating([pairwise_subset[1]])
mu_a = current_team_a_rating[0][0]
sigma_a = current_team_a_rating[0][1]
mu_b = current_team_b_rating[0][0]
sigma_b = current_team_b_rating[0][1]
ciq = math.sqrt(n * beta(**options) ** 2 + sigma_a**2 + sigma_b**2)
probability_iq = 1 / (1 + math.exp((mu_a - mu_b) / ciq))
pairwise_probabilities.append(
1 - probability_iq
)
if n > 2:
cache = deque(pairwise_probabilities)
probabilities = []
partial = len(pairwise_probabilities) / n
while len(cache) > 0:
aggregate = []
for length in range(int(partial)):
aggregate.append(cache.popleft())
aggregate_sum = sum(aggregate)
aggregate_multiple = n
for length in range(1, n - 2):
aggregate_multiple *= n - length
probabilities.append(1 - (aggregate_sum / aggregate_multiple))
return probabilities
else:
return pairwise_probabilities
This is a bit inefficient code seen and has a worse time complexity than the current implementation. But this is generally what (48) would look like translated to code. Here are the benchmark results:
Enter Model: BradleyTerryFull
Benchmark Processor: Win
Enter Random Seed: 1
----------------------------------------
Confident Matches: 5661
Predictions Made with OpenSkill's BradleyTerryFull Model:
Correct: 583 | Incorrect: 52
Accuracy: 91.81%
Process Duration: 0.8177695274353027
----------------------------------------
Predictions Made with TrueSkill Model:
Correct: 593 | Incorrect: 42
Accuracy: 93.39%
Process Duration: 3.0598957538604736
Mean Matches: 2.3195027353377617
As you can see, in practical terms the results are virtually the same. It might lend some credence to having to custom prediction functions for model if there was some data that showed they are more effective. Perhaps they are, perhaps they aren't. But without n-team match data it's not worth having such a piecewise function.
If time allowed and the there was some evidence, I am willing to implement or merge such code. If this answers your question feel free to close this issue.
Thanks for replying, there is no need to change it if it's intentional, thought it might be a mistake.