Introduction

This repository contains the code used for my MSc Project: Predicting the outcome of Dota 2 matches given hero selection using graph neural networks. Instructions are provided below for running different sections of the code.

Link to repo data: https://drive.google.com/drive/folders/1a4KU4zgnDIfKRaa82IXpaDjN2WFEGH-l?usp=sharing

Repository description:

notebooks/
    data_acquisition.ipynb          Scripting to acquire the list of matches and respective match picks, and combining them
    dataset.py                      Defines DotaV1 and DotaV2, subclasses of the Spektral Dataset class, including initialisation methods
    exploratory.ipynb               Data quality checks and insights
    filtering.ipynb                 Creates and saves dataframe of standard filter, mmr group filters and duration group filters
    graph_data_creation.ipynb       Takes the combined matches/picks csv and generates DotaV1 and DotaV2 datasets, and scales features
    modelling.ipynb                 Models graph data using single team perspective
    modelling_2.ipynb               Models graph data using multi team perspective
    modelling_3.ipynb               Models match data using single team perspective and logistic regression
    plotting.ipynb                  Creates plots to be used in the report
.gitignore                          Contains file types and folders not to be tracked with Git
Pipfile                             Contains information for pipenv to create and maintain Python virtual environment
Pipfile.lock                        Contains information for pipenv to create and maintain Python virtual environment
README.md                           Readme markdown

Instructions for use

Clone repository

The address for this repository is: https://github.com/nick-hunt/dotaprediction.git

Clone this onto your local machine. It is lightweight - it does not contain any data.

Virtual environment

Open the now cloned local repository, and type pipenv install into the terminal. This will go through the process of creating a pipenv virtual environment with the necessary libraries and versions. Ensure the virtual environment is activated before running any further code. Alternatively, code can be run on the global environment providing all necessary packages are installed. Refer to the Pipfile for required libraries.

General data comments

The repo's cloud data storage location, mentioned at the top of this readme, is structured in the same way the code would expect to read/write to the structure.

Extracting Dota data

If you wish to run the script which extracts Dota dota from the OpenDota API, open data_aquisition.ipynb. By running all cells, the script will acquire a list of matches for the date range hard-coded, then query the API to fetch the hero picks for each match, before combining the two dataframes into combined.csv.

Generate graph data

Generating graph data requires the completion of the extraction of Dota data above (combined.csv) and access to the hero attributes table (features.csv). It is quicker to download these from the repo data URL, ensuring you make a directory called data within the repo main directory and the two aforementioned .csv files within that. Then, open graph_data_creation.ipynb and run all cells. The first half of the notebook generates graph data and saves it, the second half scales the graph data and saves as scaled pickle files.

Training a model

Both extracting Dota data and generating data takes a significant amount of time. It is advisable to rather download the graphs_v1_scaled and graphs_v2_scaled folders from the repo data URL and put them into your local repo's data folder. The models exist within the modelling.ipynb (single-team perspective), modelling_2.ipynb (multi-team perspective) and modelling_3.ipynb (logistic regression model). The logistic model requires the standard_v1 folder downloaded to your local data folder. To makes things easier, a mock notebook, modelling_sample.ipynb, has been created which represents the single-team perspective GNN training but on a much smaller dataset. To run this, download the graphs_v1_scaled folder with the first file only (0-49999) to your local repo's data folder. Then run all cells within the notebook to train the model. The way this model is trained is very similar to how the main models are trained, albiet on far less data.

Plotting results

All training and validation accuracies are saved as .csv files and can be download from the repo's cloud data location under models\fit_records. Once in a similar location on the local repo, plots can be generated by running the cells in plotting.ipynb.

nick-hunt / dotaprediction