AllenDowney / empiricaldist

Python library that represents empirical distribution functions.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Need Help with CDF

arilwan opened this issue · comments

HI @AllenDowney,
I have been trying to use CDF plot for a multiclass dataframe, but all efforts failed. Here's how my dataframe looks like:

data.shape
(4672, 6)

data.head()
+---+---------+----------+----------+-------+--------------+------------+
|   | trip_id | duration | distance | speed | acceleration | travelmode |
+---+---------+----------+----------+-------+--------------+------------+
| 0 |  303633 |     2657 | 5624.06  | 2.12  | 0.00080      | bus        |
| 1 |  303637 |     1185 | 1274.04  | 1.08  | 0.00091      | foot       |
| 2 |  303638 |     1185 | 4464.56  | 3.77  | 0.00318      | car        |
| 3 |  303642 |     3350 | 7715.78  | 2.30  | 0.00069      | bus        |
| 4 |  303657 |      704 | 1155.26  | 1.64  | 0.00233      | car        |
+---+---------+----------+----------+-------+--------------+------------+

Now I want 4 CDF plots each for duration, distance, speed, and acceleration for the 5-class travelmode in the dataset [bike, bus, car, foot, metro]. I have been trying this for two days, but mess-up.

My objective is to plot (for each of duration, speed, distance, acc.) a distribution by mode, something like the following:
Screenshot 2019-11-04 at 22 43 34

Any help please?

Yes, many thanks.

I'm very sorry to reopen this issue, I'm in need of help with the plot, I couldn't manage to add a legend to the plot to show which plot is for which mode?

for name, group in data1.groupby('travelmode'):
    Cdf.from_seq(group.speed).plot()
    

title, x, y = 'Speed by mode','speed (km/h)', 'CDF'

decorate_cdf(title,x,y)

Screenshot from 2019-12-02 10-47-16

Something similar to the first plot above.

Oh yes, adding
plt.legend(groups.groups.keys())
to my decorate_cdf function did the trick.