RemedyData / Dahel_Techies_F1_Results_Analysis_Internship

Dahel Consultant Techies Internship: This is a project that entails the analysis of a car racing event.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Dahel_Techies_F1_Results_Analysis_Internship

Dahel Consultant Techies Internship: This is a project that entails the analysis of a car racing event. (The picture below is gotten from Chicagoland Speedway Website).

image

Introduction

This is an analysis of a car racing event. It is done by analyzing data from F1 Results data table which comprises of resultId, year, round, Event name, Event date, circuit name, lat, lng, code, Driver's forename, Driver's surname, dob, Driver's Age as at year of the event, Driver's nationality, Constructor name, constructor's nationality, Driver's number, grid, position, positionOrder, points, laps, time, milliseconds, fastestLap, rank, fastestLapTime, fastestLapSpeed, status, Driver's Full name, and Age fields. The project was carried out entirely using Microsoft Excel.

Problem Statement

The goal of this analysis is to:

  • Who were the top three drivers with the most points in a specific year?
  • What is the average time it took for the fastest lap in each race?
  • Which circuit had the highest number of races conducted?
  • What is the distribution of race winners by nationality for a given year?
  • How does a driver's age relate to their performance (points) in a specific year?

Skills and Concepts Demonstrated:

  • Microsoft Excel concepts like:

1 Data Importing and Exporting:

  • Importing data from external sources such as text files, CSV files, databases, or other Excel workbooks.
  • Exporting data to different formats for sharing or further analysis.

2 Data Cleaning and Transformation:

  • Removing duplicate records.
  • Handling missing values (e.g., filling in missing values, deleting rows with missing data).
  • Converting data types (e.g., converting text to numbers, dates).
  • Splitting, merging, or rearranging data across columns.

3 Formulas and Functions:

  • Utilizing built-in functions such as SUM, AVERAGE, COUNT, MAX, MIN for basic calculations.
  • Performing more complex calculations using functions like IF, VLOOKUP, INDEX-MATCH, SUMIFS, COUNTIFS, etc.
  • Creating custom functions using Excel's built-in programming language, Visual Basic for Applications (VBA).

4 Data Analysis Tools:

  • PivotTables: Summarizing and analyzing large datasets by creating dynamic tables.
  • Data Tables: Conducting what-if analysis by exploring different scenarios.
  • Solver: Optimizing solutions by finding the best possible outcome based on defined constraints.

5 Statistical Analysis:

  • Descriptive statistics: Calculating measures such as mean, median, mode, standard deviation, variance, etc.
  • Correlation analysis: Determining relationships between variables using correlation coefficients.
  • Regression analysis: Analyzing the relationship between dependent and independent variables.

6 Data Validation and Error Checking:

  • Implementing data validation rules to ensure data accuracy and consistency.
  • Using Excel's auditing tools to trace precedents and dependents, detect errors, and troubleshoot formulas.

7 Data Presentation and Reporting:

  • Formatting worksheets and cells for improved readability.

  • Creating dashboards and reports summarizing key insights.

  • Adding data labels, titles, and annotations to charts and graphs.

  • Inserting images, shapes, and text boxes to enhance presentation.


    Data Source:

The dataset for the work is gotten from Dahel Consultant Techies. It consist of 25,401 records and 31 fields of data. I studied the dataset well and its attached dictionary to gain proper insight into the dataset. You can find a link to download the dataset here:


Data Cleaning and Transformation:

After downloading the dataset, I opened the dataset as CSV files in Microsoft Excel.

  • I removed duplicate records.
  • I handled missing values (e.g., filling in missing values, deleting rows with missing data).
  • I converted data types (e.g., converting text to numbers, dates).
  • I splitted, merged, and rearranged data across columns.

The transformation view and results are displayed below:

F1-Results

image

F1-Results-5


Data Analysis:

  • Several expressions and built-in functions such as SUM, AVERAGE, COUNT, MAX, MIN and many more were made to arrive at the desired results.
  • I performed more complex calculations using functions like IF, VLOOKUP, INDEX-MATCH, SUMIFS, COUNTIFS, etc.

Features of the Report:

The dashboard conveys information about the following key areas:

  • The top three drivers with the most points in a specific year
  • The average time it took for the fastest lap in each race
  • Circuit had the highest number of races conducted
  • The distribution of race winners by nationality for a given year
  • Correlation of driver's age to their performance (points) in a specific year

The overall analysis of the dataset can be checked out here

Analysis and Observation:

Summary of the insights gained into the race game performance:

▪︎ The top drivers with the most points in the year 2021 are Max Verstappen, Lewis Hamilton, and Valtteri Bottas with an accumulated point of 388.5, 385.5 and 219 respectively

▪︎The average time it took for the fastest lap in each race 12:01:00 AM.

▪︎Circuit with the highest number of race conducted is Autodromo Nazionale di Monza with a total number 1776 races conducted.

▪︎Distribution of race winners by nationality for the year 2021:

Circuit Name Number of events
Australian 1
British 8
Dutch 10
Finnish 1
French 1
Mexican 1

▪︎There is no correlation between the driver's age and their performance.When plotted on a scatter plot, there are no connecting dots.


Thank you for reading.

I am open for entry-level to mid-level data anlalyst role.

Let us have discussion about your company and industry now!

About

Dahel Consultant Techies Internship: This is a project that entails the analysis of a car racing event.