GTFSTK is a Python 3.5+ tool kit for analyzing General Transit Feed Specification (GTFS) data in memory without a database. It uses Pandas and Shapely to do the heavy lifting.
Using Pipenv, do pipenv install gtfstk
.
You can play with ipynb/examples.ipynb
in a Jupyter notebook
Documentation is in docs/
and also on RawGit here.
- Development status is Alpha
- This project uses semantic versioning
- Thanks to MRCagney for donating to this project
- Constructive feedback and code contributions welcome
- Alex Raichev (2014-05)
- Bugfixed
geometrize_stops
which was putting some NaNs in the geometry column
- Added trip direction arrows to maps produced by
map_trips
- Fixed bug HTML-escaping apostrophes in
make_html
- Added
map_trips
which works likemap_routes
- Changed
route_to_geojson
to return LineStrings instead of a MultiLineString and added a date keyword argurment - Changed
shapes_to_geojson
to accept an optional list of shape IDs to restrict to - Added
map_routes
function to draw routes and their stops on a Folium map, if Folium is installed - Inserted stars in function signatures to separate boolean keyword arguments. Is this a breaking change? I say no, but it's debatable.
- Changed
compute_trip_stats
to accept an optional list of route IDs to restrict to - Clarified the doctstrings of
compute_route_stats
andcompute_route_time_series
to note that those functions can accept slices of trip stats - Changed
compute_stop_stats
andcompute_stop_time_series
to accept an optional list of stop IDs
- Stopped
drop_zombies
from dropping stops with location type 1 or 2 - Changed
CRS_WGS84
toWGS84
and removed theno_defs
key to agree with GeoPandas's WGS84 CRS - Replaced some
None
outputs with empty dictionary outputs where appropriate, e.g. inbuild_shape_by_geometry
- Bugfixed the
get_dates()
function. It was throwing an error when the calendar or calendar_dates table was empty.
- Bugfixed the stats and time series functions. They were throwing errors in the edge case where all the given dates had no active trips.
- Bugfixed
combine_time_series()
. Its direction ID column names were'0'
and'1'
but should be0
and1
.
- Added informative printing for Feeds
- Removed the
time_it
decorator in favor of IPython's%time
magic . - Inspired by the Transitland Dispatcher, added the
summarize
function and thelist_gtfs
function - Extended several functions to accept date lists, a breaking change for the outputs of those functions. For example, now you can compute feed stats for the entire feed period more easily and quickly (by memoizing active trip IDs) than computing the stats separately for each date.
- By popular demand, redefined the
num_trips
indicator in route and feed time series to be the number of unique trips active in a time bin instead of the time weighted average thereof - Removed columns from empty DataFrames returned by
compute_route_stats
etc. - Elaborated docstrings
- Updated the installation requirements in
setup.py
- Fixed the bug where
setup.py
could not find the license file
- Finally knuckled down and wrote a GTFS validator:
validators.py
. It's basic, easy to read, and, thanks to Pandas, fast. It checks this 31 MB Southeast Queensland feed in 22 seconds on my 2.8-GHz-processor-16-GB-memory computer. With the same computer and feed and in fast mode (--memory_db
), Google's GTFS validator takes 420 seconds. That's about 19 times slower. Part of the latter validator's slowness is its many checks beyond the GTFS, such as checks for too fast travel between every pair of stop times. - Moved all but the most basic
Feed
methods into other modules grouped by theme,routes.py
,stops.py
, etc. Eases reading and additionally exposes the methods as functions on feeds, like in the GTFSTK versions before 7.0.0. - Speeded up
miscellany.py::asssess_quality
- Refactored
constants.py
- Renamed some functions
- Rewrote most feed functions as
Feed
methods - Rewrote tests for pytest
- Removed some miscellaneous functions, such as plotting functions
- Changed
feed.read_gtfs
to unzip to temporary directory - Enabled
feed.write_gtfs
to write to a directory
- Improved function names, e.g.
compute_trips_stats
->compute_trip_stats
- Added functions to
cleaner.py
and changed cleaning function outputs to feed instances - Made
feed.copy
a method - Simplified Feed objects and added auto-updates to secondary attributes
- Changed the signatures of a few functions, e.g.
calculator.append_dist_to_shapes
now returns a feed instead of a shapes data frame - Fixed formatting of properties field in
calculator.trip_to_geojson
andcalculator.route_to_geojson
- Bugfix: Added
'from_stop_id'
and'to_stop_id'
to list of string data types inconstants.py
. Previously, they were sometimes getting interpreted as floats, which stripped leading zeros from the IDs, which then did not match the IDs in the stops data frame
- Added trip ID parameter to
calculator.get_stops
- Created
calculator.trip_to_geojson
- Added whitespace stripping to
cleaner.clean_route_short_names
- Renamed the function
calculator.get_feed_intersecting_polygon
tocalculator.restrict_by_polygon
- Added the function
calculator.restrict_by_routes
- Added the function
calculator.get_start_and_end_times
- Added the functions
calculator.compute_center
,calculator. compute_bounds
,calculator.route_to_geojson
- Extended the function
calculator.get_stops
to accept an optional route ID - Extended the function
calculator.build_geometry_by_shape
to accept and optional set of shape IDs - Extended the function
calculator.build_geometry_by_stop
to accept and optional set of stop IDs
- Improved distance sanity checks in
calculator.compute_trip_stats
andcalculator.append_dist_to_stop_times
- Bugfixed
feed.copy
so that thedist_units_in
of the copy equalsdist_units_out
of the original - Added some more distance sanity checks to
calculator.compute_trip_stats
andcalculator.append_dist_to_stop_times
- Improved
cleaner.clean_route_short_names
- Removed
utilities.clean_series
- Improved
cleaner.aggregate_routes
- Removed some unnecessary print statements
- Deleted an extraneous print statement in
calculator.create_shapes
- Added
utilities.is_not_null
- Changed
calculator.shapes_to_geojson
to return a dictionary instead of a string - Upgraded to Pandas 0.18.1 and fixed
calculator.downsample
accordingly - Added
cleaner.aggregate_routes
- Bugfix: formatted
parent_station
as a string inconstants.DTYPE
- Changed signature and behavior of
create_shapes
- Added duplicate route short name count to
assess
- Changed the behavior of
clean_route_short_names
- Changed
INT_COLS
toINT_COLUMNS
- Moved some functions
- Added some functions, such as a function to copy feeds
- Added more functions to
calculator.py
, some of which are optional and depend on GeoPandas - Documented more
- Made
read_gtfs
raise a more helpful error when an input path does not exist
- Made Matplotlib import optional
- Updated plotter function chart colors
- Moved the
Feed
class into a separate file - Fixed a fatal bug in
plot_routes_time_series
and renamed itplot_feed_time_series
- Added
route_type
to trips stats and routes stats - Added more functions to the
cleaner
module
- Modularized more
- Refactored the Feed class, exporting most methods to functions
- Changed function names, favoring a
compute_
prefix over aget_
prefix for complex functions - Bug fix: in
INT_COLUMNS
changed'dropoff_type'
to'drop_off_type'
.
- Changed to return empty data frames instead of
None
where appropriate - Added
Feed.clean_route_short_names
- Changed the inputs and outputs of
get_stops_stats
andget_stops_time_series
- Replaced
assert
statements with exceptions
- Changed name to
gtfstk
- Added
route_short_name
andmin_headway
to trips stats and routes stats - Changed the default handling of distance units in
Feed
- Assembled
feed.py
andutils.py
into a unified top-level package by tweaking__init__.py
- Renamed
get_linestring_by_shape
andget_point_by_stop
toget_geometry_by_shape
andget_geometry_by_stop
, respectively
- Added
min_transfer_time
toINT_COLUMNS
- Fixed
get_route_timetable
sort order
- Added data frame empty checks to
Feed.__init__
, because i was getting errors on feeds with emptycalendar.txt
files
- Removed
parent_station
fromINT_COLUMNS
, which should have never been there in the first place
- Now you can specify the output distance units
- Changed most functions to return an empty data frame instead of
None
- Fixed
export
so that integer columns, such as 'bike_allowed', that have at least on NaN value no longer get formatted as floats in the output CSVs
- Reduced columns in
get_trips_activity
- Added
clean_series
- Fixed a bug/typo in the computation of the
service_distance
andservice_duration
columns of feed stats
- Fixed a bug in the computation of the
peak_start_time
andpeak_end_time
columns of routes stats and feed stats
- Added more columns to
get_routes_stats
- Added
get_feed_stats
andget_feed_time_series
and removed the similaragg_routes_stats
andagg_routes_time_series
- Removed
dump_all_stats
, because it wasn't very useful - Replaced
get_busiest_date_of_first_week
withget_busiest_date
- Cleaned code slightly
- Added 'speed' column in trips stats
- Added 'is_loop' column in trips stats and routes stats
- Added more tests
- Added route and stop timetable methods
- Improved tests slightly
- Tidied code slightly
- Change occurrences of 'vehicle' to 'trips', because that's clearer
- Updated some packages
- Changed name to gtfs-tk
- Add
get_shapes_geojson
- Renamed
get_active_trips
andget_active_stops
toget_trips
andget_stops
- Upgraded to Pandas 0.15.2
- Scooped out main logic from
Feed.get_stops_stats
andFeed.get_stops_time_series
and put it into top level functions for the sake of greater flexibility. Similar to what i did forFeed.get_routes_stats
andFeed.get_routes_time_series
- Fixed a bug in computing the last stop of each trip in
get_trips_stats
- Improved the accuracy of trip distances in
get_trips_stats
- Upgraded to Pandas 0.15.1
- Added
fill_nan_route_short_names
- Switched back to version numbering in the style of major.minor.micro, because that seems more useful
- Fixed a bug in
Feed.get_routes_stats
that modified the input data frame and therefore affected the same data frame outside of the function (dumb Pandas gotcha). Changed it to operate on a copy of the data frame instead.
- Speeded up time series computations by at least a factor of 10
- Switched from representing dates as
datetime.date
objects to '%Y%m%d' strings (the GTFS way of representing dates), because that's simpler and faster. Added an export method to feed objects - Minor tweaks to
append_dist_to_stop_times
.
- Scooped out main logic from
Feed.get_routes_stats
andFeed.get_routes_time_series
and put it into top level functions for the sake of greater flexibility. I at least need that flexibility to plug into another project.
- Simplified methods to accept a single date instead of a list of dates.
- Whoops, lost track of the changes for this version.
- Changed
seconds_to_time
totimestr_to_seconds.
. Addedget_busiest_date_of_first_week
.
- Converted headways to minutes
- Added option to change headway start and end time cutoffs in
get_stops_stats
andget_stations_stats
- Fixed a bug in get_trips_stats that caused a failure when a trip was missing a shape ID
- Switched from major.minor.micro version numbering to major.minor numbering
- Added
get_vehicle_locations
.
- Added
append_dist_to_stop_times
andappend_dist_to_shapes
- Changed
get_xy_by_stop
name and output type
- Changed from period indices to timestamp indices for time series, because the latter are better supported in Pandas.
- Upgraded to Pandas 0.14.1.
- Restructured modules
- Created stats and time series aggregating functions
- Added
get_dist_from_shapes
keyword toget_trips_stats
- Fixed some typos and cleaned up the directory
- Changed
get_routes_stats
headway calculation - Fixed inconsistent outputs in time series functions.
- Minor tweak to
downsample
- Improved
get_trips_stats
and cleaned up code
- Changed time series format
- Added documentation
- Upgraded to Python 3.4
- Created
utils.py
and updated Pandas to 0.14.0
-Minor refactoring and tweaks to packaging
- Minor tweaks to packaging
- Initial version