First of all, this project is heavily inspired on this research. So shoutout to them.
I implemented a two Machine Learning algorithms to determinate the outcome of a League of Legends game based on Player-Champion Experience. I used more than 16k matches from two different servers, in order to make the training and predictions. My best result was 91% of accuracy using GBOOST.
League of Legends (LoL), a popular computer game developed by Riot Games, is currently the most widely played Multiplayer Online Battle Arena (MOBA) game in the world. In 2019, there were eight million concurrent players daily, and the player base has continued to grow since its release in 2009. A core aspect of LoL is competitive ranked gameplay. In typical ranked gameplay, ten human players are matched together to form two teams of approximately equal skill. These two teams, consisting of five players each, battle against each other to destroy the opposing team’s base.
To the matchmaker, a “fair” match can be loosely defined as a match in which each team has a 50% +/-1% chance of winning. In a perfect match, ten individuals with identical MMRs queue at the same time, each having selected a unique position that they’re well-suited for. That situation is incredibly rare depending on who is queueing at the time, so sometimes teams can have very slight skill differences.
Even though Riot has been trying to make the matchmaking as fair as poosible the fact that LoL has 150 characters with different abilities and positions can heavily impact on a match. For instance if player plays a champion who hasn't played before against someone with a close skill this person is more likely to lose.
Based on these final preposition I decided to create an machine learning algorithm to predict the outcome of a League of Legends game based on Player-Champion Experience.
The most important part of your algorithm are your dataset. So I needed a good amount of matches(samples) to train my algorithm. These are the steps I followed:
From LAN server:
- First I got around 875 random summoners from each ranking from Iron to Diamond using Riot's API. Making a total of 5250 summoners.
- Using these summoners I got their last 3 Solo-Ranked matches from Mobalytics API. Using the match ID I disregarded repeated matches. Making a total of 14k unique matches.
From NA server:
- Did the same thing to get the summoners
- Did the same thing to get the past games but in this case I only got their last SoloQ match.
For each player in each match I got:
- Their champion-masteries from championmastery.gg via web scraping.
- Their winrates with each champion in SoloQ games of season 11 and 12 (a combination of both seasons). From u.gg API. (A number from 0 to 1).
Finally I saved everything in my mongodb database to later process the data and train the algorithm.
This made a total of 12458 unique SoloQ games from LAN server and 4552 SoloQ matches from NA server.
The functions to get the save the data can be found in the file pull_data_scripts.py. While the functions to call the APIs can be found in the file api_calls.py
I processed the data in the format to pass it into the algorithm.
For both teams in each math I added the following features:
- Mastery of each summoner with their selected champion. (5 features per team).
- Average, median, coefficient of kurtosis, coefficient of skewness, standard deviation, and variance of the masteries of the team. (6 features per team).
- Winrate of each summoner with their selected champion. (5 features per team).
- Average, median, coefficient of kurtosis, coefficient of skewness, standard deviation, and variance of the winrates of the team. (6 features per team).
Making a total of 22 features per team or, 44 features per match.
Finally at the end of the sample we add the label (output). 1 if the blue team won or 0 otherwise.
Here is an example of the some finished samples:
Blue Mastery 1 | Blue Mastery 2 | Blue Mastery 3 | Blue Mastery 4 | Blue Mastery 5 | Blue Masteries Avg | Blue Masteries Median | Blue Masteries Kurtorsis | Blue Masteries Skewness | Blue Masteries Std | Blue Masteries Variance | Blue Winrate 1 | Blue Winrate 2 | Blue Winrate 3 | Blue Winrate 4 | Blue Winrate 5 | Blue Winrates Avg | Blue Winrates Median | Blue Winrates Kurtorsis | Blue Winrates Skewness | Blue Winrates Std | Blue Winrates Variance | Red Mastery 1 | Red Mastery 2 | Red Mastery 3 | Red Mastery 4 | Red Mastery 5 | Red Masteries Avg | Red Masteries Median | Red Masteries Kurtorsis | Red Masteries Skewness | Red Masteries Std | Red Masteries Variance | Red Winrate 1 | Red Winrate 2 | Red Winrate 3 | Red Winrate 4 | Red Winrate 5 | Red Winrates Avg | Red Winrates Median | Red Winrates Kurtorsis | Red Winrates Skewness | Red Winrates Std | Red Winrates Variance | Blue Won |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
302361 | 32548 | 137831 | 42344 | 2594552 | 621927.2 | 137831.0 | 4.804286377468026 | 2.1839427059899292 | 991060.5244612258 | 982200963145.36 | 0.44642857142857145 | 0.5882352941176471 | 0.23076923076923075 | 0.42857142857142855 | 0.5064102564102564 | 0.4400829562594269 | 0.44642857142857145 | 1.7867390843738296 | -1.0057794278406758 | 0.11860307225221957 | 0.014066688747665215 | 161323 | 69486 | 860782 | 5651 | 760456 | 371539.6 | 161323.0 | -3.0230679355840793 | 0.5602290310596598 | 363294.90939048404 | 131983191189.04 | 0.5522388059701493 | 0.6 | 0.4857142857142857 | 0.33333333333333326 | 0.5883838383838383 | 0.5119340526803213 | 0.5522388059701493 | 1.7550266381659405 | -1.4411615960973314 | 0.09778583443443616 | 0.00956206941603896 | 0 |
244724 | 55894 | 166393 | 151398 | 17928 | 127267.4 | 151398.0 | -1.3369649217205293 | 0.01316455063445954 | 81189.18392889536 | 6591683587.040001 | 0.5454545454545454 | 0.625 | 0.39893617021276595 | 0.5 | 0.0 | 0.4138781431334623 | 0.5 | 2.9264103385945406 | -1.6520677083219906 | 0.21946304574713268 | 0.04816402844860805 | 33130 | 9301 | 415872 | 82639 | 15557 | 111299.8 | 33130.0 | 4.422881427922964 | 2.08903337480402 | 154445.2155651317 | 23853324610.96 | 0.7142857142857143 | 0.5384615384615384 | 0.5511363636363636 | 0.6415094339622641 | 0.5 | 0.5890786100691761 | 0.5511363636363636 | -0.9509293445462141 | 0.7660317416649389 | 0.07792626131767952 | 0.006072502202951276 | 0 |
1370461 | 165699 | 328554 | 11922 | 64623 | 388251.8 | 165699.0 | 4.0663657209767 | 1.9913240765628601 | 502829.6346149061 | 252837641446.95996 | 0.5311284046692607 | 0.543859649122807 | 0.4878048780487805 | 0.4444444444444444 | 0.43333333333333335 | 0.48811414192372526 | 0.4878048780487805 | -2.714615983530692 | 0.01750089360642681 | 0.044420408562547566 | 0.0019731726968636493 | 24674 | 69412 | 24297 | 197578 | 1344271 | 332046.4 | 69412.0 | 4.679175538952802 | 2.153192672198477 | 510067.58473543485 | 260168940997.84 | 0.7333333333333333 | 0.4705882352941176 | 0.631578947368421 | 0.5357142857142857 | 0.5294117647058824 | 0.5801253132832079 | 0.5357142857142857 | -0.1714387569448257 | 0.8322535703305709 | 0.09237180165145097 | 0.008532549740335 | 0 |
859153 | 8207 | 152833 | 30736 | 94462 | 229078.2 | 94462.0 | 4.494974145065102 | 2.0987197424240636 | 319077.67728589225 | 101810564142.16 | 0.594 | 0.5833333333333334 | 0.5833333333333334 | 0.5 | 0.6111111111111112 | 0.5743555555555556 | 0.5833333333333334 | 3.8040034295105913 | -1.835360837059765 | 0.03854043251277294 | 0.0014853649382716057 | 7376 | 19807 | 249551 | 1914694 | 2021545 | 842594.6 | 249551.0 | -3.24972252904008 | 0.5761330822350282 | 923644.0250595681 | 853118285028.24 | 0.0 | 0.5 | 0.14285714285714285 | 0.5157894736842106 | 0.5236318407960199 | 0.33645569146747467 | 0.5 | -2.252574646473776 | -0.7891605499808829 | 0.22119000317369086 | 0.048925017503977375 | 1 |
The function for processing the data can be found in the file process_data.py. Note that this is processing matches with the data gathered from the APIs that is stored in my Mongo database.
The model architecture uses keras and following the structure as described in the research paper:
• Alternating dropout, normalization, and dense layers for a total of 15 layers (5 dropout, 5 normalization, and 5 dense layers). Each group of alternating layers had 160, 128, 64, 32, and 16 neurons, in that order.
- Each dropout layer had a dropout rate of 0.69%.
- Each normalization layer utilized batch normalization.
- Each dense layer used Exponential Linear Unit (ELU) activation, He initialization.
- A 1 × 1 dense layer with Sigmoid activation
Then we fit our model using the following parameters epochs=49
and batch_size=256
.
Note that we use the LAN matches for training and the NA matches for testing and validation.
Finally we evauluate the model with the train samples and validation samples. In this case we use the LAN matches for training and NA matches for testing.
The jupiter notebook can be found here
-
The model is written using the sklearn implementation of GBOOST, which by default is a Decision Tree, with the following parameters:
n_estimators=55
andlearning_rate=0.14
NOTE: I used a Stratified K Fold to test the algorithm in order to get a more accurate result, with
k=10
.
The jupiter notebook can be found here
The DNN model performed better than expected with an accuracy of 82% in the testing dataset (more than 4552 matches).
In the other hand the GBOOST showed an average accuracy of 89.42%!!!, a minimum of 88.48% and a maximum of 90.48%.
In future work I would like to add the role experience as a feature.
I did a simple UI using streamlit under the src/app.py you can run it by installing streamlit package and with the command streamlit run "your_path_to_app.py"
. I get the last match from u.gg API.