This repository contains the code used for my MSc Project: Predicting the outcome of Dota 2 matches given hero selection using graph neural networks. Instructions are provided below for running different sections of the code.
Link to repo data: https://drive.google.com/drive/folders/1a4KU4zgnDIfKRaa82IXpaDjN2WFEGH-l?usp=sharing
notebooks/
data_acquisition.ipynb Scripting to acquire the list of matches and respective match picks, and combining them
dataset.py Defines DotaV1 and DotaV2, subclasses of the Spektral Dataset class, including initialisation methods
exploratory.ipynb Data quality checks and insights
filtering.ipynb Creates and saves dataframe of standard filter, mmr group filters and duration group filters
graph_data_creation.ipynb Takes the combined matches/picks csv and generates DotaV1 and DotaV2 datasets, and scales features
modelling.ipynb Models graph data using single team perspective
modelling_2.ipynb Models graph data using multi team perspective
modelling_3.ipynb Models match data using single team perspective and logistic regression
plotting.ipynb Creates plots to be used in the report
.gitignore Contains file types and folders not to be tracked with Git
Pipfile Contains information for pipenv to create and maintain Python virtual environment
Pipfile.lock Contains information for pipenv to create and maintain Python virtual environment
README.md Readme markdown
The address for this repository is: https://github.com/nick-hunt/dotaprediction.git
Clone this onto your local machine. It is lightweight - it does not contain any data.
Open the now cloned local repository, and type pipenv install
into the terminal. This will go through the process of creating a pipenv virtual environment with the necessary libraries and versions.
Ensure the virtual environment is activated before running any further code.
Alternatively, code can be run on the global environment providing all necessary packages are installed. Refer to the Pipfile for required libraries.
The repo's cloud data storage location, mentioned at the top of this readme, is structured in the same way the code would expect to read/write to the structure.
If you wish to run the script which extracts Dota dota from the OpenDota API, open data_aquisition.ipynb
. By running all cells, the script will acquire a list of matches for the date range hard-coded, then query the API to fetch the hero picks for each match, before combining the two dataframes into combined.csv
.
Generating graph data requires the completion of the extraction of Dota data above (combined.csv
) and access to the hero attributes table (features.csv
). It is quicker to download these from the repo data URL, ensuring you make a directory called data
within the repo main directory and the two aforementioned .csv files within that. Then, open graph_data_creation.ipynb
and run all cells. The first half of the notebook generates graph data and saves it, the second half scales the graph data and saves as scaled pickle files.
Both extracting Dota data and generating data takes a significant amount of time. It is advisable to rather download the graphs_v1_scaled
and graphs_v2_scaled
folders from the repo data URL and put them into your local repo's data
folder. The models exist within the modelling.ipynb
(single-team perspective), modelling_2.ipynb
(multi-team perspective) and modelling_3.ipynb
(logistic regression model). The logistic model requires the standard_v1
folder downloaded to your local data
folder. To makes things easier, a mock notebook, modelling_sample.ipynb
, has been created which represents the single-team perspective GNN training but on a much smaller dataset. To run this, download the graphs_v1_scaled
folder with the first file only (0-49999) to your local repo's data
folder. Then run all cells within the notebook to train the model. The way this model is trained is very similar to how the main models are trained, albiet on far less data.
All training and validation accuracies are saved as .csv files and can be download from the repo's cloud data location under models\fit_records
. Once in a similar location on the local repo, plots can be generated by running the cells in plotting.ipynb
.