wyt420 / LA-Metro-Bike-Share-Analysis

基于Kaggle数据集“洛杉矶共享单车数据”进行的数据探索与分析。

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

读取数据集😄

import pandas as pd
import warnings

warnings.filterwarnings('ignore')
data = pd.read_csv('./dataset/metro-bike-share-trip-data.csv')
data.head(1)
Trip ID Duration Start Time End Time Starting Station ID Starting Station Latitude Starting Station Longitude Ending Station ID Ending Station Latitude Ending Station Longitude Bike ID Plan Duration Trip Route Category Passholder Type Starting Lat-Long Ending Lat-Long
0 1912818 180 2016-07-07T04:17:00 2016-07-07T04:20:00 3014.0 34.05661 -118.23721 3014.0 34.05661 -118.23721 6281.0 30.0 Round Trip Monthly Pass {'longitude': '-118.23721', 'latitude': '34.05... {'longitude': '-118.23721', 'latitude': '34.05...

分析思路(Analysis methods)😜

  • 热门Station分布分析(Distribution of Hot Stations)
  • 2.共享单车出行高峰时间段分析(Rush Hour Analysis)
  • 3.单程与往返的比较分析(Round Trip/One Way Comparison Analysis)
  • 使用时长与会员卡类型的关联度分析(relevancy analysis between Duration&Passholder Type)

开始数据分析(Data Analysis)😁

1)共享单车热门开始站分布可视化探索

pic = st_info.plot(kind="scatter", x="Starting Station Longitude", y="Starting Station Latitude",s = st_info['Counts']/10,
                        alpha=1)
plt.axis([-118.28,-118.22,34.02,34.07])
plt.show()

matplotlib

Conclusion

  • If there is a L.A. map should be better to analysis...

2)共享单车出行高峰时间段分析(Rush Hour Analysis)

RushHourAnalysis

Conclusion

  • 共享单车出行者通常选择在**7a.m.-20p.m.**出行

  • **6a.m.-8a.m.**早高峰时段,10a.m.-12a.m.午高峰时段以及15a.m.-17a.m.晚高峰时段都有明显的用车量增加。

  • 出行最高峰是17p.m.,推测为下班晚高峰

  • 出行最低谷为4a.m.,凌晨的用车辆非常少

  • 从高峰到低谷是一个完全下降趋势,用车辆逐渐降低

  • Shared-bike riders usually choose to start their trip at 7a.m.-20p.m

  • **6a.m.-8a.m.**morning peak period, **10a.m.-12a.m.**afternoon peak period and **15a.m.-17a.m.**evening peak period shows significant increase in shared-bike use

  • The peak of use is 17p.m., which is presumed to be the evening peak

  • The lowest point of use is 4a.m,few shared-bikes are used in mid-night

  • From the peak to the lowest point is a complete downward trend, with the vehicle gradually reduced


3)单车使用时长与会员卡持有种类的相关性分析

(relevancy analysis between Duration&Passholder Type)

bike_trip_info = bike_info[['Duration','Trip Route Category','Plan Duration','Passholder Type']]
bike_trip_info.head()
Duration Trip Route Category Plan Duration Passholder Type
0 180 Round Trip 30.0 Monthly Pass
1 1980 Round Trip 30.0 Monthly Pass
2 300 Round Trip 365.0 Flex Pass
3 10860 Round Trip 365.0 Flex Pass
4 420 Round Trip 0.0 Walk-up

单独提取Duration,进行聚类(cluster by Duration)

duration = bike_trip_info[['Duration']]
duration.head()
# duration时间单位为 秒
Duration
0 180
1 1980
2 300
3 10860
4 420
bike_trip_info.head()
Duration Duration Class Trip Route Category Plan Duration Passholder Type
0 180 short-time Round Trip 30.0 Monthly Pass
1 1980 short-time Round Trip 30.0 Monthly Pass
2 300 short-time Round Trip 365.0 Flex Pass
3 10860 medium-time Round Trip 365.0 Flex Pass
4 420 short-time Round Trip 0.0 Walk-up

查看各类Duration数量

bike_trip_info['Duration Class'].value_counts()
short-time        127398
medium-time         3180
long-time            582
very-long-time       501
Name: Duration Class, dtype: int64

Duration Class探索

雷达图可视化(Radar Chart)

Radarchart

Conclusion

  • 共享单车使用使用时间特别长的用户中绝大部分使用者即用即走型用户

  • 短途共享单车使用者中超过6成比例月卡持有者

  • 其次,短途共享单车使用者中的年卡持有者的比例也是所有类型中最高的

  • The majority of users who use shared-bikes for a very-long-time are those who use them immediately.

  • Over 60% of users who use shared-bikes for a short-time are those who possess Monthly-Pass.

  • The percentage of those who possess Flex-Pass among short-time trip bike users is also the highest among all types.


4)单程与往返的比较分析(Round Trip/One Way Comparison Analysis)

划分出单程与往返的frame(Divide the 'Round Trip'&'One Way')

one_way_trip.head()
Duration Start Time End Time Start Hour Trip Route Category Plan Duration Passholder Type
5 780 2016-07-07T12:51:00 2016-07-07T13:04:00 12 One Way 30.0 Monthly Pass
6 600 2016-07-07T12:54:00 2016-07-07T13:04:00 12 One Way 30.0 Monthly Pass
7 600 2016-07-07T12:59:00 2016-07-07T13:09:00 12 One Way 365.0 Flex Pass
9 960 2016-07-07T13:01:00 2016-07-07T13:17:00 13 One Way 30.0 Monthly Pass
10 960 2016-07-07T13:02:00 2016-07-07T13:18:00 13 One Way 365.0 Flex Pass
round_trip.head()
Duration Start Time End Time Start Hour Trip Route Category Plan Duration Passholder Type
0 180 2016-07-07T04:17:00 2016-07-07T04:20:00 04 Round Trip 30.0 Monthly Pass
1 1980 2016-07-07T06:00:00 2016-07-07T06:33:00 06 Round Trip 30.0 Monthly Pass
2 300 2016-07-07T10:32:00 2016-07-07T10:37:00 10 Round Trip 365.0 Flex Pass
3 10860 2016-07-07T10:37:00 2016-07-07T13:38:00 10 Round Trip 365.0 Flex Pass
4 420 2016-07-07T12:51:00 2016-07-07T12:58:00 12 Round Trip 0.0 Walk-up

分析One Way与Round出发时间点

出发时间点分析

分析One Way与Round中不同会员卡持有者比例

one-way round

Conclusion

  • One Way Trip的共享单车使用者持有会员卡的比例很大,占了超过70%,其中绝大部分是月卡持有者
  • Round Trip的共享单车使用者大部分是即来即走型使用者,仅有三成用户持有会员卡,持有会员卡的用户中大部分为月卡持有者

分析One Way与Round与共享单车使用时长的关系

duration_cate_info = bike_info[['Duration','Trip Route Category']]
duration_cate_info.head()
Duration Trip Route Category
0 180 Round Trip
1 1980 Round Trip
2 300 Round Trip
3 10860 Round Trip
4 420 Round Trip

输出describe信息

One Way Round
count 119026.000000 12635.000000
mean 1358.870499 3299.287693
std 5490.783118 7738.171315
min 60.000000 60.000000
25% 360.000000 900.000000
50% 600.000000 1680.000000
75% 960.000000 3180.000000
max 86400.000000 86400.000000

Conclusion

  • 根据上表中均值中位数上下四分位点可以显然得知Round Trip单车使用时长往往大于One Way Trip的使用时长,符合认知。

About

基于Kaggle数据集“洛杉矶共享单车数据”进行的数据探索与分析。


Languages

Language:Python 100.0%