CS-LEE2022 / Investigate_the_Soccer_Database_Dataset

Analyze soccerr data and unveil the relationships between multiple variables

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Project: Investigate the Soccer Database Dataset

Introduction

This soccer database comes from Kaggle and is well suited for data analysis and machine learning. It contains data for soccer matches, players, and teams from several European countries from 2008 to 2016. This dataset is quite extensive, more information could be found here.

alt text

(Image is from a copyright-free website: https://www.pexels.com/royalty-free-images/.)

  • The database is stored in a SQLite database. We can access database files using software like DB Browser;
  • This dataset will help practicing with SQL joins. Make sure to look at how the different tables relate to each other;
  • Some column titles should be self-explanatory, and others we’ll have to look up on Kaggle.
Table of Contents
Prerequisites πŸ”πŸ“œ
Design πŸ“
Conclusions πŸ“Œ
License πŸ”–

Prerequisites

  • Python 3.6.3
  • Jupyter Notebook
  • Anaconda-Navigator
  • SQLite database
  • DB Browser for SQLite

Design

Step One - Choose Data Set

Click this link to download the corresponding data.

Step Two - Get Organized

This project eventually contain:

  • The report communicating any findings;
  • Any Python code used during the analysis;
  • The data set;

Step Three - Analyze

Brainstorm some questions that could be answered using the data set, then start answering those questions, we would mainly focus on looking at the relationships between multiple variables.

Conclusions

In current study, a good amount of profound analysis has been carried out. Prior to each step, deailed instructions was given and interpretions was also provided afterwards. The two dataset included 115347 and 183978 pieces of european soccer match information ranging from 2008 to 2016, respectively. Based on such substantial data, the analysis would be more reliable as opposed to small scale analysis. The limitations of current study were original data from website hadn't been organized well, as many tables were connected via foreign to foreign key relation. More important, there was no key paired for match and player information. As such, profound analysis was inadmissible, such as player attributes's impact on match.

License

MIT Licence

About

Analyze soccerr data and unveil the relationships between multiple variables

License:MIT License


Languages

Language:HTML 53.4%Language:Jupyter Notebook 46.6%