Pyber_Analysis

Background

Create an overall snapshot of the ride-sharing data and a summary table of key metrics of the ride-sharing data by city type, and a multiple-line graph that shows the average fare for each week by each city type.

Objectives:

Use Pandas functions like groupby, pivot, resample, and reset_index on a DataFrame.
Use Pandas methods and attributes on a DataFrame or Series.
Create a new DataFrame from multiple groupby() Series.
Format columns of a DataFrame.
Create a multiple-line graph.
Annotate and apply styling to the chart.
Part 1 Instructions
Create a PyBer Summary DataFrame
Create a summary DataFrame that showcases the total riders, total drivers, total fares, average fare per ride, and average fare per driver for each city type

Instruction

Get the total rides, total drivers, and total fares for each city type using the groupby() function on the city type using the merged DataFrame or separate DataFrames.
Calculate the average fare per ride and the average fare per driver by city type.
Delete the index name.
Create the summary DataFrame with the appropriate columns and apply formatting where appropriate.

Finding

The summary Dataframe shows an overview of fare and rides in the three city types: urber, suburban and rural.

According to the data rural cities have:

The highest fare per ride
The highest fare per driver
The total fare, total drivers and total rides are the lowest numbers.

According to the data urban cities have:

The lowest fare per ride
The lowest fare per driver
The total fare, total drivers and total rides are the highest numbers. We can conclude that urban cities have cheaper fares with high number of drivers and rides.

Final Summary Table

Multiple-Line Plot for the Sum of the Fares for Each City Type

Instruction

Rename columns {'city': 'City', 'date':'Date','fare':'Fare', 'ride_id': 'Ride Id','driver_count': 'No. Drivers', 'type':'City Type'}.
Set the index to the Date column.
Create a new DataFrame for fares and include only the Date, City Type, and Fare columns using the copy() method on the merged DataFrame.
Drop the extra Date column.
Set the index to the datetime data type.
Check the DataFrame using the info() method to make sure the index is a datetime data type.
Calculate the sum() of fares by the type of city and date using groupby() to create a new DataFrame.
Reset the index.
Create a pivot table DataFrame with the Date as the index and columns = 'City Type' with the Fare for each Date in each row.
Create a new DataFrame from the pivot table DataFrame on the given dates '2019-01-01':'2019-04-28' using loc .
Create a new DataFrame by setting the DataFrame you created in Step 10 with resample() in weekly bins, and calculate the sum() of the fares for each week.
Using the object-oriented interface method, plot the DataFrame you created in Step 11 using the df.plot() function.

Finding

According to the plot, urban cities have the highest fares and all cities have a high peak just before the month of March. Sururban cities have lower fares compared to urban cities but higher fares compared to rural cities which have the lowest fares.

About

Analyze all the rideshare data from January to early May of 2019 and create a compelling visualization for the CEO, V. Isualize. Created an overall snapshot of the ride-sharing data. Presented data on a summary table of key metrics of the ride-sharing data by city type, and a multiple-line graph that shows the average fare for each week by each city type.

Languages

Language:Jupyter Notebook 100.0%