Create an overall snapshot of the ride-sharing data and a summary table of key metrics of the ride-sharing data by city type, and a multiple-line graph that shows the average fare for each week by each city type.
- Use Pandas functions like groupby, pivot, resample, and reset_index on a DataFrame.
- Use Pandas methods and attributes on a DataFrame or Series.
- Create a new DataFrame from multiple groupby() Series.
- Format columns of a DataFrame.
- Create a multiple-line graph.
- Annotate and apply styling to the chart.
- Part 1 Instructions
- Create a PyBer Summary DataFrame
- Create a summary DataFrame that showcases the total riders, total drivers, total fares, average fare per ride, and average fare per driver for each city type
- Get the total rides, total drivers, and total fares for each city type using the groupby() function on the city type using the merged DataFrame or separate DataFrames.
- Calculate the average fare per ride and the average fare per driver by city type.
- Delete the index name.
- Create the summary DataFrame with the appropriate columns and apply formatting where appropriate.
The summary Dataframe shows an overview of fare and rides in the three city types: urber, suburban and rural.
- The highest fare per ride
- The highest fare per driver
- The total fare, total drivers and total rides are the lowest numbers.
- The lowest fare per ride
- The lowest fare per driver
- The total fare, total drivers and total rides are the highest numbers. We can conclude that urban cities have cheaper fares with high number of drivers and rides.
- Rename columns {'city': 'City', 'date':'Date','fare':'Fare', 'ride_id': 'Ride Id','driver_count': 'No. Drivers', 'type':'City Type'}.
- Set the index to the Date column.
- Create a new DataFrame for fares and include only the Date, City Type, and Fare columns using the copy() method on the merged DataFrame.
- Drop the extra Date column.
- Set the index to the datetime data type.
- Check the DataFrame using the info() method to make sure the index is a datetime data type.
- Calculate the sum() of fares by the type of city and date using groupby() to create a new DataFrame.
- Reset the index.
- Create a pivot table DataFrame with the Date as the index and columns = 'City Type' with the Fare for each Date in each row.
- Create a new DataFrame from the pivot table DataFrame on the given dates '2019-01-01':'2019-04-28' using loc .
- Create a new DataFrame by setting the DataFrame you created in Step 10 with resample() in weekly bins, and calculate the sum() of the fares for each week.
- Using the object-oriented interface method, plot the DataFrame you created in Step 11 using the df.plot() function.
According to the plot, urban cities have the highest fares and all cities have a high peak just before the month of March. Sururban cities have lower fares compared to urban cities but higher fares compared to rural cities which have the lowest fares.