Possibility for parameter for how ratings and win chances adjust for uneven teams

Question

Possibility for parameter for how ratings and win chances adjust for uneven teams

spookybear0 opened this issue 4 months ago · comments

I'm using openskill for a game where sometimes we have teams of, for example, 6 vs 7. When making teams we put the better players on the team with the lesser amount of players. Openskill estimates are way off results when dealing with uneven teams. It seems that it values extra players much more than the specific game I'm using for it does.

Does anyone have any insight on how to tune a parameter that makes team disparity less important?

Thanks!

Vivek Joshy · Answer 1 · Sun Mar 10 2024 18:48:32 GMT+0800 (China Standard Time)

Duplicate #29

spookybear0 · Answer 2 · Mon Mar 11 2024 02:05:48 GMT+0800 (China Standard Time)

It's not a duplicate, this issue is about teams of different amounts, not about player performance during a game.

spookybear0 · Answer 3 · Mon Mar 11 2024 06:02:15 GMT+0800 (China Standard Time)

I decided to multiply the mu of each player on the team with fewer players by a variable factor (I used 1.1) for the win_percent function. It seems to be working fine.

Vivek Joshy · Answer 4 · Tue Apr 09 2024 01:36:56 GMT+0800 (China Standard Time)

Is your issue solved?

spookybear0 · Answer 5 · Tue Apr 09 2024 04:28:18 GMT+0800 (China Standard Time)

Not exactly, I used a temporary fix that isn't very effective. If it's not possible (or no one is willing to implement it) this issue can be closed.

Vivek Joshy · Answer 6 · Tue Apr 09 2024 17:13:17 GMT+0800 (China Standard Time)

Openskill estimates are way off results when dealing with uneven teams. It seems that it values extra players much more than the specific game I'm using for it does.

Can you provide a reproducible example?

‮Philihp Busby · Answer 7 · Wed Apr 17 2024 10:14:43 GMT+0800 (China Standard Time)

Yeah, a concrete example would be helpful here. If you don't like the result of one of the functions, then what should the result be, and why should it be that?

spookybear0 · Answer 8 · Wed Apr 17 2024 10:37:09 GMT+0800 (China Standard Time)

I apologize, I've been a little busy. I'll get some examples ready. Thanks for the help.

spookybear0 · Answer 9 · Wed Apr 17 2024 11:29:08 GMT+0800 (China Standard Time)

model = PlackettLuce()

async def get_games_with_unbalanced_teams() -> None:
    games = await SM5Game.filter(ranked=True).all()

    print("Getting win chances for unbalanced games. Close game defined as: difference <= 5000 points")

    for game in games:
        red_entity_starts = await game.get_team_entity_starts(Team.RED)
        green_entity_starts = await game.get_team_entity_starts(Team.GREEN)
        red_players = []
        green_players = []

        if not (abs(len(red_entity_starts) - len(green_entity_starts))) > 0:
            continue

        for player in red_entity_starts:
            red_players.append(await Player.filter(entity_id=player.entity_id).first())

        for player in green_entity_starts:
            green_players.append(await Player.filter(entity_id=player.entity_id).first())

        win_chance = get_win_chance(red_players, green_players) # wraps PlackettLuce.predict_win([team1, team2])
        
        score_diff = abs(await game.get_team_score(Team.RED) - await game.get_team_score(Team.GREEN))

        #if score_diff > 5000:
        #    continue

        print(
f"""Win chance for {game.id}: ({(win_chance[0]*100):.2f}%, {(win_chance[1]*100):.2f}%) \
red: {await game.get_team_score(Team.RED)}, \
green: {await game.get_team_score(Team.GREEN)}, \
difference: {score_diff}, \
close: {score_diff <= 5000}, \
team_lengths: {len(red_players)}, {len(green_players)}\
"""
)

Here's an example from my code grabbing all games with uneven teams and displaying their win chances, scores, and team sizes.

Output for close games only. The win percent chance is way off, even though these are only a few examples, even the games that weren't close still have wildly inaccurate win chances. It seems that when the team sizes change, it isn't able to predict the outcome anymore, though the amount each player adds to a team varies by game, so if a solution is implemented it should probably be one where the amount a player contributes to a team is variable (possibly exponentially).

Getting win chances for unbalanced games. Close game defined as: difference <= 5000 points
Win chance for 68: (5.29%, 94.71%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 114: (88.29%, 11.71%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 138: (10.62%, 89.38%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6
Win chance for 139: (89.38%, 10.62%) red: 30512, green: 29690, difference: 822, close: True, team_lengths: 6, 5
Win chance for 142: (78.77%, 21.23%) red: 36393, green: 41052, difference: 4659, close: True, team_lengths: 7, 6

Here's an example that includes all games, not only the close ones.

Getting win chances for unbalanced games. Close game defined as: difference <= 5000 points
Win chance for 20: (11.24%, 88.76%) red: 43232, green: 33092, difference: 10140, close: False, team_lengths: 6, 7
Win chance for 21: (78.39%, 21.61%) red: 21950, green: 35752, difference: 13802, close: False, team_lengths: 7, 6
Win chance for 35: (15.78%, 84.22%) red: 37389, green: 27611, difference: 9778, close: False, team_lengths: 5, 6
Win chance for 39: (5.30%, 94.70%) red: 25529, green: 36191, difference: 10662, close: False, team_lengths: 5, 6
Win chance for 40: (5.30%, 94.70%) red: 35550, green: 21289, difference: 14261, close: False, team_lengths: 5, 6
Win chance for 49: (4.73%, 95.27%) red: 15169, green: 30812, difference: 15643, close: False, team_lengths: 5, 6
Win chance for 68: (5.29%, 94.71%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 84: (14.19%, 85.81%) red: 42752, green: 37593, difference: 5159, close: False, team_lengths: 6, 7
Win chance for 85: (16.37%, 83.63%) red: 36072, green: 23250, difference: 12822, close: False, team_lengths: 6, 7
Win chance for 86: (78.54%, 21.46%) red: 25251, green: 31810, difference: 6559, close: False, team_lengths: 6, 5
Win chance for 114: (88.29%, 11.71%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 128: (90.52%, 9.48%) red: 19208, green: 32490, difference: 13282, close: False, team_lengths: 6, 5
Win chance for 132: (95.74%, 4.26%) red: 48572, green: 31368, difference: 17204, close: False, team_lengths: 7, 6
Win chance for 138: (10.62%, 89.38%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6
Win chance for 139: (89.38%, 10.62%) red: 30512, green: 29690, difference: 822, close: True, team_lengths: 6, 5
Win chance for 140: (94.89%, 5.11%) red: 32794, green: 18447, difference: 14347, close: False, team_lengths: 7, 6
Win chance for 141: (26.01%, 73.99%) red: 34812, green: 23032, difference: 11780, close: False, team_lengths: 6, 7
Win chance for 142: (78.77%, 21.23%) red: 36393, green: 41052, difference: 4659, close: True, team_lengths: 7, 6

My ratings are defined well with small sigma values due to amount of games and it has been proven to work well with even sized teams.

Let me know if more info is needed. Thanks guys.

‮Philihp Busby · Answer 10 · Wed Apr 17 2024 12:24:43 GMT+0800 (China Standard Time)

The win percent chance is way off

What should the win percent chance be, and why should it be that and not anything else?

spookybear0 · Answer 11 · Wed Apr 17 2024 12:45:09 GMT+0800 (China Standard Time)

Since these games were so close, the teams were more even and that should be reflected in the win chances. It should be much closer to 50:50 for games that were that close (although the score doesn't always reflect the win chance, it often does)

I've provided some control data below for what the win chances look like for close evenly matched teams. I can also provide data for evenly matched games that aren't close as well if that's needed. As you can see in the data, the win chances are much more reasonable for how close the score is (removing outliers), so it only makes sense for that to be the case for teams of uneven sizes. This is a pretty big difference compared to unevenly matched teams.

Getting win chances for balanced games. Close game defined as: difference <= 5000 points
Win chance for 10: (39.97%, 60.03%) red: 34892, green: 33791, difference: 1101, close: True, team_lengths: 7, 7
Win chance for 12: (60.03%, 39.97%) red: 37213, green: 35872, difference: 1341, close: True, team_lengths: 7, 7
Win chance for 13: (34.57%, 65.43%) red: 30909, green: 29929, difference: 980, close: True, team_lengths: 5, 5
Win chance for 19: (68.61%, 31.39%) red: 33131, green: 31551, difference: 1580, close: True, team_lengths: 6, 6
Win chance for 25: (62.64%, 37.36%) red: 28510, green: 26110, difference: 2400, close: True, team_lengths: 5, 5
Win chance for 30: (42.55%, 57.45%) red: 36732, green: 35091, difference: 1641, close: True, team_lengths: 6, 6
Win chance for 32: (55.20%, 44.80%) red: 30290, green: 30270, difference: 20, close: True, team_lengths: 6, 6
Win chance for 41: (28.73%, 71.27%) red: 30170, green: 33270, difference: 3100, close: True, team_lengths: 5, 5
Win chance for 47: (73.28%, 26.72%) red: 28110, green: 25130, difference: 2980, close: True, team_lengths: 5, 5
Win chance for 48: (56.88%, 43.12%) red: 25248, green: 27170, difference: 1922, close: True, team_lengths: 5, 5
Win chance for 53: (49.90%, 50.10%) red: 36651, green: 37532, difference: 881, close: True, team_lengths: 6, 6
Win chance for 55: (29.17%, 70.83%) red: 33772, green: 31232, difference: 2540, close: True, team_lengths: 6, 6
Win chance for 64: (68.01%, 31.99%) red: 36932, green: 34854, difference: 2078, close: True, team_lengths: 7, 7
Win chance for 76: (49.53%, 50.47%) red: 23510, green: 27610, difference: 4100, close: True, team_lengths: 5, 5
Win chance for 77: (47.55%, 52.45%) red: 37652, green: 33592, difference: 4060, close: True, team_lengths: 6, 6
Win chance for 92: (73.08%, 26.92%) red: 38894, green: 34652, difference: 4242, close: True, team_lengths: 7, 7
Win chance for 95: (44.52%, 55.48%) red: 40593, green: 40434, difference: 159, close: True, team_lengths: 7, 7
Win chance for 100: (47.35%, 52.65%) red: 40792, green: 36472, difference: 4320, close: True, team_lengths: 6, 6
Win chance for 107: (46.89%, 53.11%) red: 24510, green: 26009, difference: 1499, close: True, team_lengths: 5, 5
Win chance for 110: (47.75%, 52.25%) red: 37152, green: 35832, difference: 1320, close: True, team_lengths: 6, 6
Win chance for 111: (47.75%, 52.25%) red: 36692, green: 33312, difference: 3380, close: True, team_lengths: 6, 6
Win chance for 112: (62.95%, 37.05%) red: 28329, green: 24929, difference: 3400, close: True, team_lengths: 5, 5
Win chance for 113: (33.16%, 66.84%) red: 23689, green: 27289, difference: 3600, close: True, team_lengths: 5, 5
Win chance for 116: (50.05%, 49.95%) red: 40192, green: 38654, difference: 1538, close: True, team_lengths: 7, 7
Win chance for 123: (43.48%, 56.52%) red: 31290, green: 31012, difference: 278, close: True, team_lengths: 6, 6
Win chance for 126: (56.52%, 43.48%) red: 37651, green: 32849, difference: 4802, close: True, team_lengths: 6, 6
Win chance for 130: (61.16%, 38.84%) red: 28871, green: 31930, difference: 3059, close: True, team_lengths: 6, 6

Thanks again.

‮Philihp Busby · Answer 12 · Wed Apr 17 2024 13:51:40 GMT+0800 (China Standard Time)

I can see what you're saying, but I feel like the best path forward here would be if you wrote your own predict_win function, and then we look at how we can generalize that with a parameter. It's all open source; there are no secrets, everything's there for you to fork 😅

spookybear0 · Answer 13 · Wed Apr 17 2024 14:00:18 GMT+0800 (China Standard Time)

Of course, I just don't have any of the required experience in this field of math.

This is the solution I'm using now (which is definitely not mathematically supported but works somewhat decently). This is built towards my use case (since it only supports two teams). But it could(?) be a good place to start. I do want to point out that this isn't a great solution as it doesn't solve the problem entirely, though I'm sure there's a better solution than what I'm doing.

UNEVEN_TEAM_FACTOR = 0.09

class CustomPlackettLuce(PlackettLuce):
    def predict_win(self, teams: List[List[PlackettLuceRating]]) -> List[Union[int, float]]:
        # Check Arguments
        self._check_teams(teams)

        n = len(teams)

        # uneven team adjustment is only implemented for 2 teams

        # 2 Player Case
        if n == 2:
            # CUSTOM ADDITION
            if len(teams[0]) > len(teams[1]):
                logger.debug("Adjusting team ratings for uneven team count (team 1 has more players)")
                # team 1 has more players than team 2
                for player in teams[1]:
                    # multiply by 1 + 0.1 * the difference in player count
                    player.mu *= 1 + UNEVEN_TEAM_FACTOR * abs(len(teams[0]) - len(teams[1]))
            elif len(teams[0]) < len(teams[1]):
                logger.debug("Adjusting team ratings for uneven team count (team 2 has more players)")
                # team 2 has more players than team 1
                for player in teams[0]:
                    # multiply by 1 + 0.1 * the difference in player count
                    player.mu *= 1 + UNEVEN_TEAM_FACTOR * abs(len(teams[0]) - len(teams[1]))

            total_player_count = len(teams[0]) + len(teams[1])
            teams_ratings = self._calculate_team_ratings(teams)
            a = teams_ratings[0]
            b = teams_ratings[1]

            result = phi_major(
                (a.mu - b.mu)
                / math.sqrt(
                    total_player_count * self.beta**2
                    + a.sigma_squared
                    + b.sigma_squared
                )
            )

            return [result, 1 - result]

        return PlackettLuce.predict_win(self, teams)

Vivek Joshy · Answer 14 · Wed Apr 17 2024 19:05:58 GMT+0800 (China Standard Time)

Is this implementation able to actually predict the outcome of real matches data? I can see why in some games, teams with fewer players might win in let's say traditional games where an extra player counts. If you could provide some match data where the predict_win function is failing for close matches, then it would be very helpful. It would also aid in making a new parameter let's call it team_parity that allows the equality of uneven teams. Apriori, it seems feasible to do this. But in reality, we won't know until we test it on real data.

spookybear0 · Answer 15 · Thu Apr 18 2024 13:36:50 GMT+0800 (China Standard Time)

The implementation in my comment before is shaky at predicting the outcome for uneven teams, that's why I'm looking for a better solution. It's more of a hacky fix than a real solution. The normal implementation, on the other hand, works great for matches of even teams but fails on games with uneven player amounts.

Here's additional data for close games with team size differences only including ones that failed to predict the winner, I don't have many instances of games with uneven teams so this doesn't have much data.

Here's the data with the CustomPlackettLuce class.

Win chance for 68: (24.69%, 75.31%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 114: (56.36%, 43.64%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 138: (38.55%, 61.45%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6

And the same data using the normal PlackettLuce model

Win chance for 68: (5.29%, 94.71%) red: 35792, green: 34511, difference: 1281, close: True, team_lengths: 6, 7
Win chance for 114: (88.29%, 11.71%) red: 34134, green: 35412, difference: 1278, close: True, team_lengths: 7, 6
Win chance for 138: (10.62%, 89.38%) red: 31350, green: 27411, difference: 3939, close: True, team_lengths: 5, 6
Win chance for 142: (78.77%, 21.23%) red: 36393, green: 41052, difference: 4659, close: True, team_lengths: 7, 6

Plus here's the data for even teams to compare (is not affected by the custom model)
Even teams

Win chance for 10: (39.97%, 60.03%) red: 34892, green: 33791, difference: 1101, close: True, team_lengths: 7, 7
Win chance for 13: (34.57%, 65.43%) red: 30909, green: 29929, difference: 980, close: True, team_lengths: 5, 5
Win chance for 30: (42.55%, 57.45%) red: 36732, green: 35091, difference: 1641, close: True, team_lengths: 6, 6
Win chance for 48: (56.88%, 43.12%) red: 25248, green: 27170, difference: 1922, close: True, team_lengths: 5, 5
Win chance for 55: (29.17%, 70.83%) red: 33772, green: 31232, difference: 2540, close: True, team_lengths: 6, 6
Win chance for 77: (47.55%, 52.45%) red: 37652, green: 33592, difference: 4060, close: True, team_lengths: 6, 6
Win chance for 95: (44.52%, 55.48%) red: 40593, green: 40434, difference: 159, close: True, team_lengths: 7, 7
Win chance for 100: (47.35%, 52.65%) red: 40792, green: 36472, difference: 4320, close: True, team_lengths: 6, 6
Win chance for 110: (47.75%, 52.25%) red: 37152, green: 35832, difference: 1320, close: True, team_lengths: 6, 6
Win chance for 111: (47.75%, 52.25%) red: 36692, green: 33312, difference: 3380, close: True, team_lengths: 6, 6
Win chance for 123: (43.48%, 56.52%) red: 31290, green: 31012, difference: 278, close: True, team_lengths: 6, 6
Win chance for 130: (61.16%, 38.84%) red: 28871, green: 31930, difference: 3059, close: True, team_lengths: 6, 6

I have lots of data for this and I can provide more if needed (or if it needs to be displayed differently). I'm also available if you would like to test any additions on my codebase, I'd be happy to help out with that part.

Vivek Joshy · Answer 16 · Fri Apr 19 2024 04:46:43 GMT+0800 (China Standard Time)

A few instances of data are unfortunately not sufficient. We require match counts in the tens of thousands (the bigger the better) from real and actually played games to verify the effectiveness of such changes. To show that changes work, we use these open-source datasets to perform data analysis and measure performance. You can see the datasets used in /benchmark as a starting point.