RemedyData / Dahel_Tour_de_France_Analysis_Internship

Dahel Consultant and Techies Internship: This is a project that entails the analysis of a Tour de France (Tourism)

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dahel_Tour_de_France_Analysis_Internship

Dahel Consultant and Techies Internship: This is a project that entails the analysis of a Tour de France (Tourism) (The picture below is gotten from Chicagoland Epic Game Website).

image

Introduction

This is an analysis of a bike racing event. It is done by analyzing data from 5 different tables namely tdf_finishers, tdf_winners, tdf_stages, tdf_tours, including an attached data dictionary. The project was carried out entirely using Microsoft Excel.

Problem Statement

The goal of this analysis is to:

  • Calculate the average speed of the riders in a specific year or across multiple years.

  • Determine which rider has won the most stages in a given year or across multiple years.

  • Calculate the time margin between the winner and other riders for each stage or overall in a race.

  • Analyze the performance of different teams by calculating the total time taken by all riders from the same team in a race.

  • Determine the birthplaces of the riders and their frequency.

Skills and Concepts Demonstrated:

  • Microsoft Excel concepts like:

1 Data Importing and Exporting:

  • Importing data from external sources such as text files, CSV files, databases, or other Excel workbooks.
  • Exporting data to different formats for sharing or further analysis.

2 Data Cleaning and Transformation:

  • Removing duplicate records.
  • Handling missing values (e.g., filling in missing values, deleting rows with missing data).
  • Converting data types (e.g., converting text to numbers, dates).
  • Splitting, merging, or rearranging data across columns.

3 Formulas and Functions:

  • Utilizing built-in functions such as SUM, AVERAGE, COUNT, MAX, MIN for basic calculations.
  • Performing more complex calculations using functions like IF, VLOOKUP, INDEX-MATCH, SUMIFS, COUNTIFS, etc.
  • Creating custom functions using Excel's built-in programming language, Visual Basic for Applications (VBA).

4 Data Analysis Tools:

  • PivotTables: Summarizing and analyzing large datasets by creating dynamic tables.
  • Data Tables: Conducting what-if analysis by exploring different scenarios.
  • Solver: Optimizing solutions by finding the best possible outcome based on defined constraints.

5 Statistical Analysis:

  • Descriptive statistics: Calculating measures such as mean, median, mode, standard deviation, variance, etc.
  • Correlation analysis: Determining relationships between variables using correlation coefficients.
  • Regression analysis: Analyzing the relationship between dependent and independent variables.

6 Data Validation and Error Checking:

  • Implementing data validation rules to ensure data accuracy and consistency.
  • Using Excel's auditing tools to trace precedents and dependents, detect errors, and troubleshoot formulas.

7 Data Presentation and Reporting:

  • Formatting worksheets and cells for improved readability.

  • Creating dashboards and reports summarizing key insights.

  • Adding data labels, titles, and annotations to charts and graphs.

  • Inserting images, shapes, and text boxes to enhance presentation.


    Data Source:

The dataset for the work is gotten from Dahel Consultant Techies. It consist of 5 different tables namely tdf_finishers, tdf_winners, tdf_stages, tdf_tours, including an attached data dictionary. I studied the dataset well and its attached dictionary to gain proper insight into the dataset. You can find a link to download the dataset here:


Data Cleaning and Transformation:

After downloading the dataset, I opened the dataset as CSV files in Microsoft Excel.

  • I removed duplicate records.
  • I handled missing values (e.g., filling in missing values, deleting rows with missing data).
  • I converted data types (e.g., converting text to numbers, dates).
  • I splitted, merged, and rearranged data across columns.

The transformation view and results are displayed below:

Dahel_Tour_de_France

Dahel_Tour_de_France-1

Dahel_Tour_de_France-2

Dahel_Tour_de_France-3

Dahel_Tour_de_France-4

Dahel_Tour_de_France-5


Data Analysis:

  • Several expressions and built-in functions such as SUM, AVERAGE, COUNT, MAX, MIN and many more were made to arrive at the desired results.
  • I performed more complex calculations using functions like IF, VLOOKUP, INDEX-MATCH, SUMIFS, COUNTIFS, etc.

Features of the Report:

The analysis conveys information about the following key areas:

  • average speed of the riders in a specific year or across multiple years.

  • rider who has won the most stages in a given year or across multiple years.

  • time margin between the winner and other riders for each stage or overall in a race.

  • performance of different teams by calculating the total time taken by all riders from the same team in a race.

  • birthplaces of the riders and their frequency of wins.

The overall analysis of the dataset can be checked out here

Analysis and Observation:

Summary of the insights gained into the bike race performance:

  • The average speed of the riders in the year 1903 is 24.2km/hr

  • The Average Speed of riders across all the years is 3247.8km/hr

  • The Average Speed of riders for multiple years(1903,1922and1935) is 80.6km/hr

  • Eddy Merckx, Bernard Hinault, Jacques Anquetil had the highest wins with the sum of wins of 32, 21, 16 respectively.

  • Rider who won the most stages in a given year(1948) happens to be Gino Bartali while the rider who won the most stages across multiple year happens to be Eddy Merckx.

  • There is no correlation between Riders place and year of birth Frequency of Wins

Thank you for reading.

I am open for entry-level to mid-level data anlalyst role.

Let us have discussion about your company and industry now!

About

Dahel Consultant and Techies Internship: This is a project that entails the analysis of a Tour de France (Tourism)