yusufakajose / Football-Match-Outcome-Prediction

This is a not so short notebook about Spanish football league in 2011-2012 season.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Football-Match-Outcome-Prediction

Table Of Contents

  • Introduction
  • Data Loading and Preparation
    • Exploratory Data Analysis
    • Creating New Variables
      • Possession Ratio
      • Goals Per Game
      • Shots On Target Ratio
      • Passing Accuracy
      • Number Of Fouls Per Game
      • Yellow Card Ratio
      • Red Card Ratio
      • Home Advantage
      • Winning Ratio
  • Predicting Match Outcomes using Machine Learning
    • Data Preprocessing
    • Decision Trees
    • Random Forest
    • Support Vector Machines
    • Naive Bayes
    • K-Nearest Neighbors
    • Model Comparison and Diagnostics
  • Conclusion

Project Overview

This project aims to predict the outcome of football matches using machine learning techniques. The dataset used in this project is from the La Liga 2011-2012 season, which includes data from all 20 teams that competed that year.

The project starts with data loading and preparation, followed by exploratory data analysis to identify trends and patterns in the data. New variables were created such as possession ratio, goals per game, shots on target ratio, passing accuracy, number of fouls per game, yellow card ratio, red card ratio, home advantage, winning ratio, and form.

A head-to-head analysis was performed between Barcelona and Real Madrid, two of the biggest rivals in La Liga. The analysis included a rivalry timeline that displayed the number of matches played, the number of goals scored, and the number of yellow and red cards given in each match between the two teams.

Afterwards, five machine learning models were used to predict the outcomes of football matches: Decision Trees, Random Forest, Support Vector Machines, Naive Bayes, and K-Nearest Neighbors.

The models were compared based on their accuracy scores, with Support Vector Machines performing the best (55.26%) and Naive Bayes performing the worst (45.61%).

The poor accuracy scores can be attributed to the small size of the dataset and the limited number of features available. However, the project provides valuable insights into the factors that contribute to match outcomes and suggests areas for further exploration.

To improve the accuracy of the models, additional data can be collected such as player statistics, team formations, and weather conditions. In addition, feature engineering techniques such as scaling, normalization, and dimensionality reduction can be applied to the existing features to enhance their predictive power.

In conclusion, this project demonstrates the use of machine learning techniques to predict the outcomes of football matches. While the accuracy of the models may be limited due to the dataset size and feature availability, the project provides valuable insights into the factors that contribute to match outcomes and suggests avenues for further research.

About

This is a not so short notebook about Spanish football league in 2011-2012 season.