ocrim1996 / VolleyballPlayoffPrediction

Machine learning methods for predictive analysis of team performance in sports.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Machine learning methods for predictive analysis of team performance in sports

MIT License Platform Library Library Library Library

Introduction

The objective of this study is to use data relating to the past sports performance of individual players in the Volleyball Serie A, to predict at the beginning of the championship which teams will access the final phase of the championship, the Playoffs, following the Regular Season phase. . For this work, the data relating to the men's volleyball Serie A seasons from the 2001/02 to the 2017/18 season were taken into consideration. Specifically, the data regarding the performance of each individual athlete season by season were considered. Each team is therefore represented by the set of players who make up the squad at the beginning of the season. The aim of the work was to identify supervised learning models capable of predicting future events using information on past events.

Dataset Fragment

Fragment of the dataset used

Analysis of the Results

The statistical classifications (also called Metrics), obtained through the Confusion Matrix, which were used for this project are the following:

  • Accuracy;
  • Balanced_Accuracy;
  • Precision;
  • Recall;
  • F1_score;
  • Error (an introduced metric created specifically for this project).

Supervised Learning Models Used

The supervised learning models that were used for this project are as follows:

  • Logistic Regression;
  • SVC with linear kernel (Support Vector Classification - an extension of SVM);
  • SVC with RBF (Radial Basis Function) kenrel (Support Vector Classification - an extension of SVM).

In particular, for the SVC model with RBF kernel four different implementations have been made (for more information read the report.pdf).

Logistic Regression

For this model, various parameters are available, the ones we have focused on most are:

  • C is the penalty parameter of the error term. In our case it takes value

  • solver indicates the algorithm to be used in the optimization problem. In our case it is "lbfgs".

  • max_iter indicates the maximum number of iterations for the solver to converge. In our case it was assigned a value of 200.

  • the other parameters have their default value

To Run this Model

$ python3 LogisticRegression.py

SVC with Linear Kernel

For this model, various parameters are available, the ones we have focused on most are:

  • C is the penalty parameter of the error term. In our case it takes value

  • max_iter indicates the maximum number of iterations for the solver to converge. In our case it was assigned a value of 20000.

  • the other parameters have their default value

To Run this Model

$ python3 LinearSVC.py

SVC with RBF Kernel

For this model, various parameters are available, the ones we have focused on most are:

  • C is the penalty parameter of the error term. In our case it takes value

  • kernel specifies the type of kernel to be used in the algorithm. It can be "linear", "poly", "rbf" and "sigmoid". In our case it has value "rbf" or Gaussian kernel.

  • gamma γ is a kernel coefficient for "rbf" types. Possible values for this variable are

  • the other parameters have their default value

To Run this Model

Four different implementations of this model have been created (for more information see the report.pdf)

  • To run the third implementation
$ python3 NoLinearSVC.py
  • To run the fourth implementation
$ python3 NoLinearSVC_with_Probability.py

Example of Output

Output SVC Example

Example of output for the 2008 test with this implementation

Comparison of the Results

The comparison of the results obtained with the various models was made in terms of the F1_score metric. Below is the table summarizing the results obtained:

MODEL F1_score
Logistic Regression 75,9%
SVC with Linear Kernel 73,3%
SVC with RBF Kernel first implementation 82,5%
SVC with RBF Kernel second implementation 80,9%
SVC with RBF Kernel third implementation 81,4%
SVC with RBF Kernel fourth implementation 81%

Libraries Needed

To run the code you need the following libraries:

Library Version
numpy >= 1.19.4
pandas >= 1.1.5
scikit-learn >= 0.24.0
scipy >= 1.3.1

The code has been tested with MacOS Catalina (version 10.15.2).

License

MIT License. See LICENSE file for further information.

About

Machine learning methods for predictive analysis of team performance in sports.

License:MIT License


Languages

Language:Python 100.0%