vivekjoshy / openskill.py

Multiplayer Rating System. No Friction.

Home Page:https://openskill.me

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Possibility for parameter for how ratings and win chances adjust for uneven teams

spookybear0 opened this issue · comments

I'm using openskill for a game where sometimes we have teams of, for example, 6 vs 7. When making teams we put the better players on the team with the lesser amount of players. Openskill estimates are way off results when dealing with uneven teams. It seems that it values extra players much more than the specific game I'm using for it does.

Does anyone have any insight on how to tune a parameter that makes team disparity less important?

Thanks!

Duplicate #29

It's not a duplicate, this issue is about teams of different amounts, not about player performance during a game.

I decided to multiply the mu of each player on the team with fewer players by a variable factor (I used 1.1) for the win_percent function. It seems to be working fine.

Is your issue solved?

Not exactly, I used a temporary fix that isn't very effective. If it's not possible (or no one is willing to implement it) this issue can be closed.

Openskill estimates are way off results when dealing with uneven teams. It seems that it values extra players much more than the specific game I'm using for it does.

Can you provide a reproducible example?

Yeah, a concrete example would be helpful here. If you don't like the result of one of the functions, then what should the result be, and why should it be that?

I apologize, I've been a little busy. I'll get some examples ready. Thanks for the help.

model = PlackettLuce()

async def get_games_with_unbalanced_teams() -> None:
    games = await SM5Game.filter(ranked=True).all()

    print("Getting win chances for unbalanced games. Close game defined as: difference <= 5000 points")

    for game in games:
        red_entity_starts = await game.get_team_entity_starts(Team.RED)
        green_entity_starts = await game.get_team_entity_starts(Team.GREEN)
        red_players = []
        green_players = []

        if not (abs(len(red_entity_starts) - len(green_entity_starts))) > 0:
            continue

        for player in red_entity_starts:
            red_players.append(await Player.filter(entity_id=player.entity_id).first())

        for player in green_entity_starts:
            green_players.append(await Player.filter(entity_id=player.entity_id).first())

        win_chance = get_win_chance(red_players, green_players) # wraps PlackettLuce.predict_win([team1, team2])
        
        score_diff = abs(await game.get_team_score(Team.RED) - await game.get_team_score(Team.GREEN))

        #if score_diff > 5000:
        #    continue

        print(
f"""Win chance for {game.id}: ({(win_chance[0]*100):.2f}%, {(win_chance[1]*100):.2f}%) \
red: {await game.get_team_score(Team.RED)}, \
green: {await game.get_team_score(Team.GREEN)}, \
difference: {score_diff}, \
close: {score_diff <= 5000}, \
team_lengths: {len(red_players)}, {len(green_players)}\
"""
)

Here's an example from my code grabbing all games with uneven teams and displaying their win chances, scores, and team sizes.

Output for close games only. The win percent chance is way off, even though these are only a few examples, even the games that weren't close still have wildly inaccurate win chances. It seems that when the team sizes change, it isn't able to predict the outcome anymore, though the amount each player adds to a team varies by game, so if a solution is implemented it should probably be one where the amount a player contributes to a team is variable (possibly exponentially).

Getting win chances for unbalanced games. Close game defined as: difference <= 5000 points
Win chance for 68: (5.29%, 94.71%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 114: (88.29%, 11.71%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 138: (10.62%, 89.38%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6
Win chance for 139: (89.38%, 10.62%) red: 30512, green: 29690, difference: 822, close: True, team_lengths: 6, 5
Win chance for 142: (78.77%, 21.23%) red: 36393, green: 41052, difference: 4659, close: True, team_lengths: 7, 6

Here's an example that includes all games, not only the close ones.

Getting win chances for unbalanced games. Close game defined as: difference <= 5000 points
Win chance for 20: (11.24%, 88.76%) red: 43232, green: 33092, difference: 10140, close: False, team_lengths: 6, 7
Win chance for 21: (78.39%, 21.61%) red: 21950, green: 35752, difference: 13802, close: False, team_lengths: 7, 6
Win chance for 35: (15.78%, 84.22%) red: 37389, green: 27611, difference: 9778, close: False, team_lengths: 5, 6
Win chance for 39: (5.30%, 94.70%) red: 25529, green: 36191, difference: 10662, close: False, team_lengths: 5, 6
Win chance for 40: (5.30%, 94.70%) red: 35550, green: 21289, difference: 14261, close: False, team_lengths: 5, 6
Win chance for 49: (4.73%, 95.27%) red: 15169, green: 30812, difference: 15643, close: False, team_lengths: 5, 6
Win chance for 68: (5.29%, 94.71%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 84: (14.19%, 85.81%) red: 42752, green: 37593, difference: 5159, close: False, team_lengths: 6, 7
Win chance for 85: (16.37%, 83.63%) red: 36072, green: 23250, difference: 12822, close: False, team_lengths: 6, 7
Win chance for 86: (78.54%, 21.46%) red: 25251, green: 31810, difference: 6559, close: False, team_lengths: 6, 5
Win chance for 114: (88.29%, 11.71%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 128: (90.52%, 9.48%) red: 19208, green: 32490, difference: 13282, close: False, team_lengths: 6, 5
Win chance for 132: (95.74%, 4.26%) red: 48572, green: 31368, difference: 17204, close: False, team_lengths: 7, 6
Win chance for 138: (10.62%, 89.38%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6
Win chance for 139: (89.38%, 10.62%) red: 30512, green: 29690, difference: 822, close: True, team_lengths: 6, 5
Win chance for 140: (94.89%, 5.11%) red: 32794, green: 18447, difference: 14347, close: False, team_lengths: 7, 6
Win chance for 141: (26.01%, 73.99%) red: 34812, green: 23032, difference: 11780, close: False, team_lengths: 6, 7
Win chance for 142: (78.77%, 21.23%) red: 36393, green: 41052, difference: 4659, close: True, team_lengths: 7, 6

My ratings are defined well with small sigma values due to amount of games and it has been proven to work well with even sized teams.

Let me know if more info is needed. Thanks guys.

The win percent chance is way off

What should the win percent chance be, and why should it be that and not anything else?

Since these games were so close, the teams were more even and that should be reflected in the win chances. It should be much closer to 50:50 for games that were that close (although the score doesn't always reflect the win chance, it often does)

I've provided some control data below for what the win chances look like for close evenly matched teams. I can also provide data for evenly matched games that aren't close as well if that's needed. As you can see in the data, the win chances are much more reasonable for how close the score is (removing outliers), so it only makes sense for that to be the case for teams of uneven sizes. This is a pretty big difference compared to unevenly matched teams.

Getting win chances for balanced games. Close game defined as: difference <= 5000 points
Win chance for 10: (39.97%, 60.03%) red: 34892, green: 33791, difference: 1101, close: True, team_lengths: 7, 7
Win chance for 12: (60.03%, 39.97%) red: 37213, green: 35872, difference: 1341, close: True, team_lengths: 7, 7
Win chance for 13: (34.57%, 65.43%) red: 30909, green: 29929, difference: 980, close: True, team_lengths: 5, 5
Win chance for 19: (68.61%, 31.39%) red: 33131, green: 31551, difference: 1580, close: True, team_lengths: 6, 6
Win chance for 25: (62.64%, 37.36%) red: 28510, green: 26110, difference: 2400, close: True, team_lengths: 5, 5
Win chance for 30: (42.55%, 57.45%) red: 36732, green: 35091, difference: 1641, close: True, team_lengths: 6, 6
Win chance for 32: (55.20%, 44.80%) red: 30290, green: 30270, difference: 20, close: True, team_lengths: 6, 6
Win chance for 41: (28.73%, 71.27%) red: 30170, green: 33270, difference: 3100, close: True, team_lengths: 5, 5
Win chance for 47: (73.28%, 26.72%) red: 28110, green: 25130, difference: 2980, close: True, team_lengths: 5, 5
Win chance for 48: (56.88%, 43.12%) red: 25248, green: 27170, difference: 1922, close: True, team_lengths: 5, 5
Win chance for 53: (49.90%, 50.10%) red: 36651, green: 37532, difference: 881, close: True, team_lengths: 6, 6
Win chance for 55: (29.17%, 70.83%) red: 33772, green: 31232, difference: 2540, close: True, team_lengths: 6, 6
Win chance for 64: (68.01%, 31.99%) red: 36932, green: 34854, difference: 2078, close: True, team_lengths: 7, 7
Win chance for 76: (49.53%, 50.47%) red: 23510, green: 27610, difference: 4100, close: True, team_lengths: 5, 5
Win chance for 77: (47.55%, 52.45%) red: 37652, green: 33592, difference: 4060, close: True, team_lengths: 6, 6
Win chance for 92: (73.08%, 26.92%) red: 38894, green: 34652, difference: 4242, close: True, team_lengths: 7, 7
Win chance for 95: (44.52%, 55.48%) red: 40593, green: 40434, difference: 159, close: True, team_lengths: 7, 7
Win chance for 100: (47.35%, 52.65%) red: 40792, green: 36472, difference: 4320, close: True, team_lengths: 6, 6
Win chance for 107: (46.89%, 53.11%) red: 24510, green: 26009, difference: 1499, close: True, team_lengths: 5, 5
Win chance for 110: (47.75%, 52.25%) red: 37152, green: 35832, difference: 1320, close: True, team_lengths: 6, 6
Win chance for 111: (47.75%, 52.25%) red: 36692, green: 33312, difference: 3380, close: True, team_lengths: 6, 6
Win chance for 112: (62.95%, 37.05%) red: 28329, green: 24929, difference: 3400, close: True, team_lengths: 5, 5
Win chance for 113: (33.16%, 66.84%) red: 23689, green: 27289, difference: 3600, close: True, team_lengths: 5, 5
Win chance for 116: (50.05%, 49.95%) red: 40192, green: 38654, difference: 1538, close: True, team_lengths: 7, 7
Win chance for 123: (43.48%, 56.52%) red: 31290, green: 31012, difference: 278, close: True, team_lengths: 6, 6
Win chance for 126: (56.52%, 43.48%) red: 37651, green: 32849, difference: 4802, close: True, team_lengths: 6, 6
Win chance for 130: (61.16%, 38.84%) red: 28871, green: 31930, difference: 3059, close: True, team_lengths: 6, 6

Thanks again.

I can see what you're saying, but I feel like the best path forward here would be if you wrote your own predict_win function, and then we look at how we can generalize that with a parameter. It's all open source; there are no secrets, everything's there for you to fork 😅

Of course, I just don't have any of the required experience in this field of math.

This is the solution I'm using now (which is definitely not mathematically supported but works somewhat decently). This is built towards my use case (since it only supports two teams). But it could(?) be a good place to start. I do want to point out that this isn't a great solution as it doesn't solve the problem entirely, though I'm sure there's a better solution than what I'm doing.

UNEVEN_TEAM_FACTOR = 0.09

class CustomPlackettLuce(PlackettLuce):
    def predict_win(self, teams: List[List[PlackettLuceRating]]) -> List[Union[int, float]]:
        # Check Arguments
        self._check_teams(teams)

        n = len(teams)

        # uneven team adjustment is only implemented for 2 teams

        # 2 Player Case
        if n == 2:
            # CUSTOM ADDITION
            if len(teams[0]) > len(teams[1]):
                logger.debug("Adjusting team ratings for uneven team count (team 1 has more players)")
                # team 1 has more players than team 2
                for player in teams[1]:
                    # multiply by 1 + 0.1 * the difference in player count
                    player.mu *= 1 + UNEVEN_TEAM_FACTOR * abs(len(teams[0]) - len(teams[1]))
            elif len(teams[0]) < len(teams[1]):
                logger.debug("Adjusting team ratings for uneven team count (team 2 has more players)")
                # team 2 has more players than team 1
                for player in teams[0]:
                    # multiply by 1 + 0.1 * the difference in player count
                    player.mu *= 1 + UNEVEN_TEAM_FACTOR * abs(len(teams[0]) - len(teams[1]))

            total_player_count = len(teams[0]) + len(teams[1])
            teams_ratings = self._calculate_team_ratings(teams)
            a = teams_ratings[0]
            b = teams_ratings[1]

            result = phi_major(
                (a.mu - b.mu)
                / math.sqrt(
                    total_player_count * self.beta**2
                    + a.sigma_squared
                    + b.sigma_squared
                )
            )

            return [result, 1 - result]

        return PlackettLuce.predict_win(self, teams)

Is this implementation able to actually predict the outcome of real matches data? I can see why in some games, teams with fewer players might win in let's say traditional games where an extra player counts. If you could provide some match data where the predict_win function is failing for close matches, then it would be very helpful. It would also aid in making a new parameter let's call it team_parity that allows the equality of uneven teams. Apriori, it seems feasible to do this. But in reality, we won't know until we test it on real data.

The implementation in my comment before is shaky at predicting the outcome for uneven teams, that's why I'm looking for a better solution. It's more of a hacky fix than a real solution. The normal implementation, on the other hand, works great for matches of even teams but fails on games with uneven player amounts.

Here's additional data for close games with team size differences only including ones that failed to predict the winner, I don't have many instances of games with uneven teams so this doesn't have much data.

Here's the data with the CustomPlackettLuce class.

Win chance for 68: (24.69%, 75.31%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 114: (56.36%, 43.64%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 138: (38.55%, 61.45%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6

And the same data using the normal PlackettLuce model

Win chance for 68: (5.29%, 94.71%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 114: (88.29%, 11.71%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 138: (10.62%, 89.38%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6
Win chance for 142: (78.77%, 21.23%) red: 36393, green: 41052, difference: 4659, close: True, team_lengths: 7, 6

Plus here's the data for even teams to compare (is not affected by the custom model)
Even teams

Win chance for 10: (39.97%, 60.03%) red: 34892, green: 33791, difference: 1101, close: True, team_lengths: 7, 7
Win chance for 13: (34.57%, 65.43%) red: 30909, green: 29929, difference: 980, close: True, team_lengths: 5, 5
Win chance for 30: (42.55%, 57.45%) red: 36732, green: 35091, difference: 1641, close: True, team_lengths: 6, 6
Win chance for 48: (56.88%, 43.12%) red: 25248, green: 27170, difference: 1922, close: True, team_lengths: 5, 5
Win chance for 55: (29.17%, 70.83%) red: 33772, green: 31232, difference: 2540, close: True, team_lengths: 6, 6
Win chance for 77: (47.55%, 52.45%) red: 37652, green: 33592, difference: 4060, close: True, team_lengths: 6, 6
Win chance for 95: (44.52%, 55.48%) red: 40593, green: 40434, difference: 159, close: True, team_lengths: 7, 7
Win chance for 100: (47.35%, 52.65%) red: 40792, green: 36472, difference: 4320, close: True, team_lengths: 6, 6
Win chance for 110: (47.75%, 52.25%) red: 37152, green: 35832, difference: 1320, close: True, team_lengths: 6, 6
Win chance for 111: (47.75%, 52.25%) red: 36692, green: 33312, difference: 3380, close: True, team_lengths: 6, 6
Win chance for 123: (43.48%, 56.52%) red: 31290, green: 31012, difference: 278, close: True, team_lengths: 6, 6
Win chance for 130: (61.16%, 38.84%) red: 28871, green: 31930, difference: 3059, close: True, team_lengths: 6, 6

I have lots of data for this and I can provide more if needed (or if it needs to be displayed differently). I'm also available if you would like to test any additions on my codebase, I'd be happy to help out with that part.

A few instances of data are unfortunately not sufficient. We require match counts in the tens of thousands (the bigger the better) from real and actually played games to verify the effectiveness of such changes. To show that changes work, we use these open-source datasets to perform data analysis and measure performance. You can see the datasets used in /benchmark as a starting point.