robbiejdunne / cfg_submission

⚽📊 All code and analysis by Edd Webster (@eddwebster) as part the submission for the City Football Group Junior Data Scientist role. GitHub repo also includes links to publicly available resources made available by the football analytics community that are relevant to this project.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Edd Webster CFG Junior Data Scientist Data Challenge Submission

This repository is the code and analysis submitted by Edd Webster for the CFG Junior Data Scientist Data Challenge submission, including a list of publicly available resources published by the football analytics community that are related to the project. For a summary pack in Google Slides, please see the following [link], or the PowerPoint version [link] or PDF version [link] saved locally in this repository.

Edd Webster City Football Group Metrica Sports

👋 About this Repository and Author

For more information about this repository, I am available through all the following channels:

📋 Contents:

📔 Notebooks

For code used to produce the Chance Quality Models and engineered dataset of game two of the Metrica Sports sample data, see the notebooks subfolder, in which the workflow is divided into the following:

  • Metrica Sports - analysis of the major chances that each team created during game 2 with an exported dataset of shots for predictions to be made by the following Chance Quality Models developed from the shots data (static version).
  • Chance Quality Models from Shots Data to calculate the probability of a shot resulting in a goal. Two models have been created (so far), the first, a Logistic Regression model and the second using the Gradient Boosting algorithm XGBoost:

💾 Data

ℹ️ Data Sources

Due to the 100mb file size limitation in GitHub, all datasets used in this analysis, including the Tracking data, have been exported and made publicly available to view and download in Google Drive at the following [link]. However, all the important files are below 100mb, including the shots data provided, and the exported Metrica Sports shot data with calculated features including Interfering and Intervening players, both raw and engineered, are available in the data folder of this repository.

📄 Documentation

All documentation saved locally in the documentation subfolder, including:

🏛️ Libaries

The Python libraries used in this notebook include:

📑 Resources and Further Reading

Please see my football_analytics repository for my attempt to create as concise a list of possible of publicly available resources published by the football analytics community.

The follow resources are those that were specifically used to inform and create my submission for the CFG Junior Data Scientist Data Challenge, specifically focusing on Expected Goals and Tracking data. I have also included links to other topics related to the role such as the application of Reinforcement Learning in football. Credits to all those cited below.

⚽ Football Analytics

👨‍🎓 Tutorials

🏛️ Libaries and GitHub Repos

✒️ Written Pieces

For a full list of Expected Goals literature, see the following [link].

Papers

The following Shiny App from Lars Maurath is a great tool for looking up publications [link].

Blogs
News Articles
Books

📼 Videos

For a YouTube playlist of videos collated around the topics of Expected Goals, see [link]. For a Tracking data in Football specific playlist, see [link].

Webinars and Lectures
Miscellaneous
YouTube Channels

🔊 Podcasts

List of notable episodes:

🐦 Tweets

  • The benefits of including fake data in an Expected Goals model by David Sumpter [link].

🧪 Data Science

Mathematics

Classification Metrics

General
Overview
ROC AUC
Log Loss

Modeling

Logistic Regression
XGBoost

Feature Interpretation

SHAP

About

⚽📊 All code and analysis by Edd Webster (@eddwebster) as part the submission for the City Football Group Junior Data Scientist role. GitHub repo also includes links to publicly available resources made available by the football analytics community that are relevant to this project.


Languages

Language:Jupyter Notebook 99.3%Language:Python 0.7%