motis-project / motis

Intermodal Mobility Information System

Home Page:https://motis-project.de

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Nigiri stuck at 0% RUNNING

laem opened this issue · comments

commented

Trying to load the new version of the Bretagne GTFS aggregate, motis start is stuck.

https://www.korrigo.bzh/ftp/OPENDATA/KORRIGOBRET.gtfs.zip

I'm using the latest Motis release.

I cannot find any useful log. It's not stuck loading this particular GTFS, but stuck at the nigiri global step only when this GTFS is included, wether it's one of multiple gtfs config scheduls, or the only one.

image

The nigiri logs :

image

My guess is that there is an error in the GTFS files, but I don't know how to probe nigiri's output to find it.

commented

This page provides some validation information about the GTFS file, but no error is visible https://transport.data.gouv.fr/resources/81559#validation-report

commented

gtfstidy solved my problem.

FTR, this is the backtrace from that state:

#0  0x000055a5ea44131d in nigiri::floyd_warshall<unsigned short> (mat=...) at /k/transport/src/motis/deps/nigiri/include/nigiri/loader/floyd_warshall.h:21
#1  0x000055a5ea43d614 in nigiri::loader::process_component (tt=..., lb=..., ub=ub@entry=..., fgraph=..., matrix_memory=..., adjust_footpaths=true) at /k/transport/src/motis/deps/nigiri/src/loader/build_footpaths.cc:452
#2  0x000055a5ea43e4c5 in nigiri::loader::transitivize_footpaths(nigiri::timetable&, bool)::$_1::operator()<std::__1::__wrap_iter<std::__1::pair<unsigned int, unsigned int>*>, std::__1::__wrap_iter<std::__1::pair<unsigned int, unsigned int>*> >(std::__1::__wrap_iter<std::__1::pair<unsigned int, unsigned int>*>, std::__1::__wrap_iter<std::__1::pair<unsigned int, unsigned int>*>) const (lb=..., lb@entry=..., ub=..., this=<optimized out>)     at /k/transport/src/motis/deps/nigiri/src/loader/build_footpaths.cc:522
#3  utl::equal_ranges_linear<std::__1::__wrap_iter<std::__1::pair<unsigned int, unsigned int>*>, nigiri::loader::transitivize_footpaths(nigiri::timetable&, bool)::$_0, nigiri::loader::transitivize_footpaths(nigiri::timetable&, bool)::$_1>(std::__1::__wrap_iter<std::__1::pair<unsigned int, unsigned int>*>, std::__1::__wrap_iter<std::__1::pair<unsigned int, unsigned int>*>, nigiri::loader::transitivize_footpaths(nigiri::timetable&, bool)::$_0&&, nigiri::loader::transitivize_footpaths(nigiri::timetable&, bool)::$_1&&) (begin=..., end=..., eq=..., func=...) at /k/transport/src/motis/deps/utl/include/utl/equal_ranges_linear.h:34
#4  utl::equal_ranges_linear<std::__1::vector<std::__1::pair<unsigned int, unsigned int>, std::__1::allocator<std::__1::pair<unsigned int, unsigned int> > >, nigiri::loader::transitivize_footpaths(nigiri::timetable&, bool)::$_0, nigiri::loader::transitivize_footpaths(nigiri::timetable&, bool)::$_1>(std::__1::vector<std::__1::pair<unsigned int, unsigned int>, std::__1::allocator<std::__1::pair<unsigned int, unsigned int> > >&, nigiri::loader::transitivize_footpaths(nigiri::timetable&, bool)::$_0&&, nigiri::loader::transitivize_footpaths(nigiri::timetable&, bool)::$_1&&) (c=..., eq=..., func=...) at /k/transport/src/motis/deps/utl/include/utl/equal_ranges_linear.h:41
#5  nigiri::loader::transitivize_footpaths (tt=..., adjust_footpaths=<optimized out>) at /k/transport/src/motis/deps/nigiri/src/loader/build_footpaths.cc:518
#6  0x000055a5ea440024 in nigiri::loader::build_footpaths (tt=..., adjust_footpaths=true, merge_duplicates=false) at /k/transport/src/motis/deps/nigiri/src/loader/build_footpaths.cc:616
#7  0x000055a5ea3ea250 in nigiri::loader::finalize (tt=..., adjust_footpaths=true, merge_duplicates=false) at /k/transport/src/motis/deps/nigiri/src/loader/init_finish.cc:49

I also checked. The problem is that it has almost 13k stops with coordinates stop_lat=0, stop_lon=0:

$ csvcut -c stop_lat stops.txt | grep ^0$ | wc -l
12841

I guess the GTFS validator should consider stops with stop_lat == 0 && stop_lng == 0 invalid.

MOTIS computes the transitive hull of all footpaths and since those 13k stops are close to each other (same coordinate) they are automatically connected and therefore we run a Floyd-Warshall with 13k entries.

The only thing we can do to "fix" it would be to exclude stops at (0, 0) from the process that creates additional footpaths.

Removing the whole "we create additional footpaths" thing would require to have perfect datasets which is unreasonable to assume if you have no control over the data creation process.

FTR, this is the backtrace from that state:

Thank you for checking! Yes, Floyd Warshall has cubic complexity. Usually we have relatively small numbers of stops that are connected in one component.

Another option would be to figure out which component size causes problems and skip this step completely for components that are larger.

Something similar happens with the feed for Paris. While it does eventually finish, it takes really long to do so (30min).
I couldn't find any stops with 0 coordinates, but it still spends most of the time in floyd_warshall.
I already preprocessed the feed with gtfstidy.

In case someone wants to have a look, the feed is here:
https://data.iledefrance-mobilites.fr/api/v2/catalog/datasets/offre-horaires-tc-gtfs-idfm/files/a925e164271e4bca93433756d6a340d1

Thank you for the feed reference! I am experimenting with a different approach to build transitive footpaths.

motis-project/nigiri#76

However, I am not sure if it will really solve the issue and if it will work and be faster what the memory usage will be.

@felixguendling btw since you asked about the transitous data set a few days ago, I have finished setting up a public rsync server.
rsync -rav --progress routing.spline.de::transitous /path/to/dest

rsync -rav --progress routing.spline.de::transitous ./transitous
rsync: [Receiver] failed to connect to routing.spline.de (130.133.110.91): Connection refused (111)
rsync: [Receiver] failed to connect to routing.spline.de (2001:470:51c5:babe::91:1): Cannot assign requested address (99)
rsync error: error in socket IO (code 10) at clientserver.c(139) [Receiver=3.2.7]

Seems like the port is closed.

Sorry about that, seems like rsyncd crashed. I'll need some time to debug that

No worries. Let me know when I can try again.

Rsync should work now, please let me know if it stops working again :)

Perfect! Thank you. Then we'll include this in our benchmark datasets.

rsync -rav --progress routing.spline.de::transitous ./transitous
rsync: [Receiver] failed to connect to routing.spline.de (2001:470:51c5:babe::91:1): Connection refused (111)
rsync: [Receiver] failed to connect to routing.spline.de (130.133.110.91): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(139) [Receiver=3.2.7]

Rsync to the transitous dataset seems down again.

I have restarted it now, unfortunately saving the coredump didn't work properly, so I still don't know what caused it

Thank you, now it works :)

But only for so long.

...
sk_zssk.gtfs.zip
      1.042.907 100%    1,11MB/s    0:00:00 (xfr#190, to-chk=16/207)
uk_great-britain.gtfs.zip
    244.252.672  62%    2,51MB/s    0:00:58  
rsync: connection unexpectedly closed (2402516896 bytes received so far) [receiver]
rsync error: error in rsync protocol data stream (code 12) at io.c(231) [receiver=3.2.7]
rsync: connection unexpectedly closed (10209 bytes received so far) [generator]
rsync error: error in rsync protocol data stream (code 12) at io.c(231) [generator=3.2.7]

I tried again but it results in the same error as before.

rsync -rav --progress routing.spline.de::transitous ./transitous
rsync: [Receiver] failed to connect to routing.spline.de (2001:470:51c5:babe::91:1): Connection refused (111)
rsync: [Receiver] failed to connect to routing.spline.de (130.133.110.91): Connection refused (111)
rsync error: error in socket IO (code 10) at clientserver.c(139) [Receiver=3.2.7]

Sorry :(
I hope it's better know, but I'm not super sure

Don't worry about it. It finished the download of all the datasets now. Thanks again.