import pandas as pd
import warnings
warnings.filterwarnings('ignore')
data = pd.read_csv('./dataset/metro-bike-share-trip-data.csv')
data.head(1)
Trip ID | Duration | Start Time | End Time | Starting Station ID | Starting Station Latitude | Starting Station Longitude | Ending Station ID | Ending Station Latitude | Ending Station Longitude | Bike ID | Plan Duration | Trip Route Category | Passholder Type | Starting Lat-Long | Ending Lat-Long | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
0 | 1912818 | 180 | 2016-07-07T04:17:00 | 2016-07-07T04:20:00 | 3014.0 | 34.05661 | -118.23721 | 3014.0 | 34.05661 | -118.23721 | 6281.0 | 30.0 | Round Trip | Monthly Pass | {'longitude': '-118.23721', 'latitude': '34.05... | {'longitude': '-118.23721', 'latitude': '34.05... |
- 热门Station分布分析(Distribution of Hot Stations)
- 2.共享单车出行高峰时间段分析(Rush Hour Analysis)
- 3.单程与往返的比较分析(Round Trip/One Way Comparison Analysis)
- 使用时长与会员卡类型的关联度分析(relevancy analysis between Duration&Passholder Type)
pic = st_info.plot(kind="scatter", x="Starting Station Longitude", y="Starting Station Latitude",s = st_info['Counts']/10,
alpha=1)
plt.axis([-118.28,-118.22,34.02,34.07])
plt.show()
- If there is a L.A. map should be better to analysis...
-
共享单车出行者通常选择在**7a.m.-20p.m.**出行
-
**6a.m.-8a.m.**早高峰时段,10a.m.-12a.m.午高峰时段以及15a.m.-17a.m.晚高峰时段都有明显的用车量增加。
-
出行最高峰是17p.m.,推测为下班晚高峰
-
出行最低谷为4a.m.,凌晨的用车辆非常少
-
从高峰到低谷是一个完全下降趋势,用车辆逐渐降低
-
Shared-bike riders usually choose to start their trip at 7a.m.-20p.m
-
**6a.m.-8a.m.**morning peak period, **10a.m.-12a.m.**afternoon peak period and **15a.m.-17a.m.**evening peak period shows significant increase in shared-bike use
-
The peak of use is 17p.m., which is presumed to be the evening peak
-
The lowest point of use is 4a.m,few shared-bikes are used in mid-night
-
From the peak to the lowest point is a complete downward trend, with the vehicle gradually reduced
bike_trip_info = bike_info[['Duration','Trip Route Category','Plan Duration','Passholder Type']]
bike_trip_info.head()
Duration | Trip Route Category | Plan Duration | Passholder Type | |
---|---|---|---|---|
0 | 180 | Round Trip | 30.0 | Monthly Pass |
1 | 1980 | Round Trip | 30.0 | Monthly Pass |
2 | 300 | Round Trip | 365.0 | Flex Pass |
3 | 10860 | Round Trip | 365.0 | Flex Pass |
4 | 420 | Round Trip | 0.0 | Walk-up |
duration = bike_trip_info[['Duration']]
duration.head()
# duration时间单位为 秒
Duration | |
---|---|
0 | 180 |
1 | 1980 |
2 | 300 |
3 | 10860 |
4 | 420 |
bike_trip_info.head()
Duration | Duration Class | Trip Route Category | Plan Duration | Passholder Type | |
---|---|---|---|---|---|
0 | 180 | short-time | Round Trip | 30.0 | Monthly Pass |
1 | 1980 | short-time | Round Trip | 30.0 | Monthly Pass |
2 | 300 | short-time | Round Trip | 365.0 | Flex Pass |
3 | 10860 | medium-time | Round Trip | 365.0 | Flex Pass |
4 | 420 | short-time | Round Trip | 0.0 | Walk-up |
查看各类Duration数量
bike_trip_info['Duration Class'].value_counts()
short-time 127398
medium-time 3180
long-time 582
very-long-time 501
Name: Duration Class, dtype: int64
-
共享单车使用使用时间特别长的用户中绝大部分使用者是即用即走型用户
-
短途共享单车使用者中超过6成比例为月卡持有者
-
其次,短途共享单车使用者中的年卡持有者的比例也是所有类型中最高的
-
The majority of users who use shared-bikes for a very-long-time are those who use them immediately.
-
Over 60% of users who use shared-bikes for a short-time are those who possess Monthly-Pass.
-
The percentage of those who possess Flex-Pass among short-time trip bike users is also the highest among all types.
one_way_trip.head()
Duration | Start Time | End Time | Start Hour | Trip Route Category | Plan Duration | Passholder Type | |
---|---|---|---|---|---|---|---|
5 | 780 | 2016-07-07T12:51:00 | 2016-07-07T13:04:00 | 12 | One Way | 30.0 | Monthly Pass |
6 | 600 | 2016-07-07T12:54:00 | 2016-07-07T13:04:00 | 12 | One Way | 30.0 | Monthly Pass |
7 | 600 | 2016-07-07T12:59:00 | 2016-07-07T13:09:00 | 12 | One Way | 365.0 | Flex Pass |
9 | 960 | 2016-07-07T13:01:00 | 2016-07-07T13:17:00 | 13 | One Way | 30.0 | Monthly Pass |
10 | 960 | 2016-07-07T13:02:00 | 2016-07-07T13:18:00 | 13 | One Way | 365.0 | Flex Pass |
round_trip.head()
Duration | Start Time | End Time | Start Hour | Trip Route Category | Plan Duration | Passholder Type | |
---|---|---|---|---|---|---|---|
0 | 180 | 2016-07-07T04:17:00 | 2016-07-07T04:20:00 | 04 | Round Trip | 30.0 | Monthly Pass |
1 | 1980 | 2016-07-07T06:00:00 | 2016-07-07T06:33:00 | 06 | Round Trip | 30.0 | Monthly Pass |
2 | 300 | 2016-07-07T10:32:00 | 2016-07-07T10:37:00 | 10 | Round Trip | 365.0 | Flex Pass |
3 | 10860 | 2016-07-07T10:37:00 | 2016-07-07T13:38:00 | 10 | Round Trip | 365.0 | Flex Pass |
4 | 420 | 2016-07-07T12:51:00 | 2016-07-07T12:58:00 | 12 | Round Trip | 0.0 | Walk-up |
- One Way Trip的共享单车使用者持有会员卡的比例很大,占了超过70%,其中绝大部分是月卡持有者。
- Round Trip的共享单车使用者大部分是即来即走型使用者,仅有三成用户持有会员卡,持有会员卡的用户中大部分为月卡持有者。
duration_cate_info = bike_info[['Duration','Trip Route Category']]
duration_cate_info.head()
Duration | Trip Route Category | |
---|---|---|
0 | 180 | Round Trip |
1 | 1980 | Round Trip |
2 | 300 | Round Trip |
3 | 10860 | Round Trip |
4 | 420 | Round Trip |
输出describe信息
One Way | Round | |
---|---|---|
count | 119026.000000 | 12635.000000 |
mean | 1358.870499 | 3299.287693 |
std | 5490.783118 | 7738.171315 |
min | 60.000000 | 60.000000 |
25% | 360.000000 | 900.000000 |
50% | 600.000000 | 1680.000000 |
75% | 960.000000 | 3180.000000 |
max | 86400.000000 | 86400.000000 |
- 根据上表中均值,中位数,上下四分位点可以显然得知Round Trip单车使用时长往往大于One Way Trip的使用时长,符合认知。