OneBusAway / onebusaway-gtfs-modules

A Java-based library for reading, writing, and transforming public transit data in the GTFS format, including database support.

Home Page:https://github.com/OneBusAway/onebusaway-gtfs-modules/wiki

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

GTFS-file-merging for area of SanFrancisco fails (entity reference not found for Route with id "01")

nikolauskrismer opened this issue · comments

I really hope that this is the right location for this report. In fact I am using gtfs-transformer-cli to merge multiple GTFS-files. But as I could not find a repo on github for gtfs-transformer-cli this is the best I could find...

Summary:
I am trying to merge GTFS file for the area of San Francisco by BART, Tideline-Water-Taxi and SRTM. I expect that to work but instead I get the following error (with latest SNAPSHOT version 1.3.48):

[GtfsReader.java:176] : reading entities: org.onebusaway.gtfs.model.Agency
[GtfsReader.java:176] : reading entities: org.onebusaway.gtfs.model.Block
[GtfsReader.java:176] : reading entities: org.onebusaway.gtfs.model.ShapePoint
[GtfsReader.java:176] : reading entities: org.onebusaway.gtfs.model.Note
[GtfsReader.java:176] : reading entities: org.onebusaway.gtfs.model.Area
[GtfsReader.java:176] : reading entities: org.onebusaway.gtfs.model.Route
[GtfsReader.java:176] : reading entities: org.onebusaway.gtfs.model.Stop
[GtfsReader.java:176] : reading entities: org.onebusaway.gtfs.model.Trip
org.onebusaway.csv_entities.exceptions.CsvEntityIOException: io error: entityType=org.onebusaway.gtfs.model.Trip path=trips.txt lineNumber=2
	at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:161)
	at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:120)
	at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:115)
	at org.onebusaway.gtfs.serialization.GtfsReader.run(GtfsReader.java:178)
	at org.onebusaway.gtfs.serialization.GtfsReader.run(GtfsReader.java:166)
	at org.onebusaway.gtfs_transformer.GtfsTransformer.readGtfs(GtfsTransformer.java:194)
	at org.onebusaway.gtfs_transformer.GtfsTransformer.run(GtfsTransformer.java:157)
	at org.onebusaway.gtfs_transformer.GtfsTransformerMain.runApplication(GtfsTransformerMain.java:247)
	at org.onebusaway.gtfs_transformer.GtfsTransformerMain.run(GtfsTransformerMain.java:106)
	at org.onebusaway.gtfs_transformer.GtfsTransformerMain.main(GtfsTransformerMain.java:85)
Caused by: org.onebusaway.gtfs.serialization.EntityReferenceNotFoundException: entity reference not found: type=org.onebusaway.gtfs.model.Route id=01
	at org.onebusaway.gtfs.serialization.GtfsReader.getAgencyForEntity(GtfsReader.java:218)
	at org.onebusaway.gtfs.serialization.GtfsReader$GtfsReaderContextImpl.getAgencyForEntity(GtfsReader.java:305)
	at org.onebusaway.gtfs.serialization.mappings.EntityFieldMappingImpl$ConverterImpl.convert(EntityFieldMappingImpl.java:104)
	at org.onebusaway.gtfs.serialization.mappings.EntityFieldMappingImpl.translateFromCSVToObject(EntityFieldMappingImpl.java:61)
	at org.onebusaway.csv_entities.IndividualCsvEntityReader.readEntity(IndividualCsvEntityReader.java:131)
	at org.onebusaway.csv_entities.IndividualCsvEntityReader.handleLine(IndividualCsvEntityReader.java:98)
	at org.onebusaway.csv_entities.CsvEntityReader.readEntities(CsvEntityReader.java:157)

Steps to reproduce:
Use these files from transitfeed and try to merge them using gtfs-transformer-cli (using argument overwriteDuplicates).
https://transitfeeds.com/p/bart/58/latest/download
https://transitfeeds.com/p/tideline-water-taxi/756/latest/download
https://transitfeeds.com/p/sfmta/60/latest/download

Expected behavior:
That I get a merges gtfs file that contains the information in all the three files listed above.

Observed behavior:
No resulting GTFS file was created. Program exited with an error

Platform:
Fedora 28 on an Intel i7 System (64bit).
JDK used is openjdk version "1.8.0_181"
Version of gtfs-transformer-cli is 1.3.48

Additional information:
I already did some investigations on this issue. The problem seems to be that the GTFS file from BART references routes a bit akward. While in trips.txt the route with id "01" is referenced, in file routes.txt the id is "1" (without the leading zero)... there is no "01". So the error message is quite accurate. The only problem I have is that for transitfeeds the GTFS file is correct (as you might know, they seem to do some validation.. and the affected GTFS file validates without an error) and that if you treat the id as an integer (which to my understanding almost all agencies do) the reference actually should work. Would it make sense to have an auto-correcting mechanism here or to at least provide an argument where ids can be forced as integer values?

Regards,
Niko

This isn't really a bug, but a feature request to work around some "bad" data.

Do you have development experience? If so, this should be easily addressable via a Transformation Strategy. Something like: https://github.com/OneBusAway/onebusaway-gtfs-modules/blob/master/onebusaway-gtfs-transformer/src/main/java/org/onebusaway/gtfs_transformer/impl/FeedInfoFromAgencyStrategy.java

I can give you some pointers if this is something you want to take on.

I do have development experience, so I can send you a PR that addresses this.

However, my concern is the best way to solve this... I need to at least know these things:

  • should this only be implemented for routes or for all the entities that have an ID (or for multiple but not all)?
  • should a parameter be used to enable the TransformationStrategy (and if so, what should it be named?) or should it be always active (like always removing leading zeros if the value can be interpreted as a number and store both as valid IDs)?

Excellent! I welcome a PR on this.

My thoughts:

  1. a generic solution would be excellent but is likely unwarranted at this time since this is the only occurrence I'm aware of. As such, making it specific to routes is fine for me
  2. I picture a named Transformation here. Again, the additional effort for an argument isn't necessary here.

Ok... got you.
I will name the transformer option "remove_leading_zeros".

The solutions seems to be a bit tricky though. I tried to use a GtfsTransformStrategy and a GtfsEntityTransformStrategy subclass. Both do not seem to work here.

The first approach using GtfsTransformStrategy does not work, since the whole GTFS file is read before such a strategy is applied (so the exception is raised before code in such a subclass gets executed).
When using a GtfsEntityTransformStrategy implementation code is indeed applied during the loading process, but sadly after (!) each line from the CSV file has been read (and validated).
Therefore, I can modify routes but not the trips referencing them.
I hope my explanation is clear... what I exactly mean is: I can modify the id of a a route with id "001" to "1". But if I try to modify a trip referencing (the not existing) route "01" to "1", then I get the error that route "01" is not found before I am able to intercept the loading.
I could also guess how many zeros are prepanded and modify the routes accordingly, but I that would be a very error-prone solution...

At the moment I would assume that the best solution would be to modify the csv loading strategy (just like it is done when trimValues is applied).

What do you think? Can you give me further insights?

It looks like maybe BART fixed their GTFS? I wanted to see if I could offer some solutions to the issue above and when I downloaded the GTFS I see that the BART GTFS references route_id 01 in both the routes.txt and trips.txt files. I was able to merge the 3 files successfully.

@Heidebritta: Thank you.... you are right! That's actually very good news!

You can find both files here: http://transitfeeds.com/p/bart/58
The version of 10/19/2018 uses "1" in routes.txt, but "01" in trips.txt, while the version of 10/23/2018 uses "01" in both files.

So for me this issue lost its importance (at least for now... until I come across the problem in GTFS files from other agencies).