ropensci / opentripplanner

An R package to set up and use OpenTripPlanner (OTP) as a local or remote multimodal trip planner.

Home Page:https://docs.ropensci.org/opentripplanner

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

non-deterministic batch routing?

AlexandraKapp opened this issue · comments

Hi,
thanks for the work on the package - makes especially batch routing with OTP super easy.
I just ran into one problem:
I'm using otp_plan for batch routing (~4000 routings). It seems, that even though all parameters are set identical for two runs, the batch routing does not always return the same results.
E.g. for one route there are once 2 options and once 3 options (therefore also leading to different "shortest travel time" results). It also does not make a difference, wether I set the parameter numItineraries = 3 or not.
When running the routing only for that one specific route (so no batch routing), the same routes are always returned.
Not sure if I am doing sth wrong, or if this is a possible behaviour of the OTP router, when too many requests are fired at once?

Hi @AlexandraKapp I've not come across this problem before. But it sounds more like a bug in OTP itself rather than the R package.

I suppose the obvious difference is the time of day you did the routing, or are you fixing the date an time?

If you can give me some more detail e.g. R version, package, version, options you are setting I might be able to give you a more complete answer.

Hi, thanks for the response.

This is the query I run (with "to_list" being a sf data frame of ~4000 points), so all parameters (as date) should be fixed.

opentripplanner::otp_plan(
    otp_con,
    fromPlace = from_place,
    toPlace = to_list,
    maxWalkDistance = 10000,
    mode = c("WALK", "TRANSIT"),
    date = as.POSIXct("2020-07-07 08:00"),
    numItineraries = 3,
    get_geometry = FALSE,
    ncores = 6,
    routeOptions = list(walkReluctance = 20, optimize = "QUICK")
  )

R Version: 4.0.2
opentripplanner version: 0.2.3.0

Have you specified the timezone in otp_con and can you reproduce the problem on a smaller batch?

Could you post the results you get? Do all 4000 give the wrong result or just some?

So I tried with only 50 examples (random 50 stops out of the VBB GTFS Feed - see attached csv), and there the same happens: running the query with the exact same parameters returns different results.

library(dplyr)
library(sf)
library(readr)
otp_con <- opentripplanner::otp_connect() # otp with Berlin (VBB) GTFS Feed

fromPlace <- c(13.36892,52.52585) # Berlin Hauptbahnhof
toList <- read_csv("stops_example.csv") %>% st_as_sf(coords = c("stop_lon", "stop_lat"))

run1 <- opentripplanner::otp_plan(
  otp_con,
  fromPlace = fromPlace,
  toPlace = toList,
  maxWalkDistance = 10000,
  mode = c("WALK", "TRANSIT"),
  date = as.POSIXct("2020-07-07 08:00"),
  numItineraries = 3,
  get_geometry = FALSE,
  ncores = 6,
  routeOptions = list(walkReluctance = 20, optimize = "QUICK")
)

run2 <- opentripplanner::otp_plan(
  otp_con,
  fromPlace = fromPlace,
  toPlace = toList,
  maxWalkDistance = 10000,
  mode = c("WALK", "TRANSIT"),
  date = as.POSIXct("2020-07-07 08:00"),
  numItineraries = 3,
  get_geometry = FALSE,
  ncores = 6,
  routeOptions = list(walkReluctance = 20, optimize = "QUICK")
)

run3 <- opentripplanner::otp_plan(
  otp_con,
  fromPlace = fromPlace,
  toPlace = toList,
  maxWalkDistance = 10000,
  mode = c("WALK", "TRANSIT"),
  date = as.POSIXct("2020-07-07 08:00"),
  numItineraries = 3,
  get_geometry = FALSE,
  ncores = 6,
  routeOptions = list(walkReluctance = 20, optimize = "QUICK")
)

nrow(run1)
nrow(run2)
nrow(run3)

All three runs return different amount of rows.

stops_example.zip

A few more things to check:

  1. Try running it with only single core ncores = 1
  2. Try running without the routeOptions
  3. Try getting the geometry and comparing the routes, is there a clear difference e.g. a bus route that is only sometimes considered?
  4. Can you check if the routes are matched across each run, i.e. the run with the most rows includes all of the routes in the run with the least rows, or are they different routes between runs?

If it is happening with 50 routes, try reducing again, it will be easier to debug with the smallest number of examples.

I tried running this on the test data and I had no trouble

install.packages("opentripplanner")

library(opentripplanner)
# Path to a folder containing the OTP.jar file, change to where you saved the file.
path_data <- file.path(tempdir(), "OTP")
dir.create(path_data)
path_otp <- otp_dl_jar(path_data)
otp_dl_demo(path_data)
# Build Graph and start OTP
log1 <- otp_build_graph(otp = path_otp, dir = path_data)
log2 <- otp_setup(otp = path_otp, dir = path_data)
otpcon <- otp_connect(timezone = "Europe/London")

# Simple Test

fromPlace = c(-1.16489, 50.64990)
toPlace = c(-1.15803, 50.72515)

run1 <- otp_plan(otpcon,
                  fromPlace = fromPlace,
                  toPlace = toPlace,
                  date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                  mode = c("WALK", "TRANSIT")
)

run2 <- otp_plan(otpcon,
                 fromPlace = fromPlace,
                 toPlace = toPlace,
                 date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                 mode = c("WALK", "TRANSIT")
)

identical(run1, run2)

# Do batch
fromPlace <- matrix(rep(fromPlace, 10), ncol = 2, byrow = TRUE)
toPlace <- matrix(rep(toPlace, 10), ncol = 2, byrow = TRUE)

run3 <- otp_plan(otpcon,
                 fromPlace = fromPlace,
                 toPlace = toPlace,
                 date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                 mode = c("WALK", "TRANSIT")
)

run4 <- otp_plan(otpcon,
                 fromPlace = fromPlace,
                 toPlace = toPlace,
                 date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                 mode = c("WALK", "TRANSIT")
)

identical(run3, run4)

# mulitcore

run5 <- otp_plan(otpcon,
                 fromPlace = fromPlace,
                 toPlace = toPlace,
                 date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                 mode = c("WALK", "TRANSIT"),
                 ncores = 3
)

run6 <- otp_plan(otpcon,
                 fromPlace = fromPlace,
                 toPlace = toPlace,
                 date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                 mode = c("WALK", "TRANSIT"),
                 ncores = 3
)

identical(run5, run6)

# options
opts <- list(walkReluctance = 20, optimize = "QUICK")
otp_validate_routing_options(opts)
#true

run7<- otp_plan(otpcon,
                 fromPlace = fromPlace,
                 toPlace = toPlace,
                 date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                 mode = c("WALK", "TRANSIT"),
                 ncores = 3,
                routeOptions = opts
)

run8 <- otp_plan(otpcon,
                 fromPlace = fromPlace,
                 toPlace = toPlace,
                 date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                 mode = c("WALK", "TRANSIT"),
                 ncores = 3,
                 routeOptions = opts
)

identical(run7, run8)

#no geom

run9 <- otp_plan(otpcon,
                fromPlace = fromPlace,
                toPlace = toPlace,
                date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                mode = c("WALK", "TRANSIT"),
                ncores = 3,
                routeOptions = opts,
                get_geometry = FALSE
)

run10 <- otp_plan(otpcon,
                 fromPlace = fromPlace,
                 toPlace = toPlace,
                 date_time = as.POSIXct(strptime("2018-06-03 13:30", "%Y-%m-%d %H:%M")),
                 mode = c("WALK", "TRANSIT"),
                 ncores = 3,
                 routeOptions = opts,
                 get_geometry = FALSE
)

identical(run9, run10)

Could you confrim that these all work for you on the test data?

I tried you're code on the test data and with my own data - both has worked fine.

I played around with the parameters a little bit on my data and found:

  • different responses are produced starting from about 30 toPlaces
  • the more toPlaces, the more different responses (as the majority of routings return the same response)
  • the setting of paramters does not make a difference (timezone, nCores, options). I can reproduce the different responses with this query:
otp_con <- opentripplanner::otp_connect(timezone = "Europe/London")

run1 <- opentripplanner::otp_plan(
  otp_con,
  fromPlace = fromPlace,
  toPlace = toList,
  mode = c("WALK", "TRANSIT"),
  date = as.POSIXct("2020-07-07 09:00"),
  ncores = 1
)
  • the more toPlaces are queried the higher the likelihood that responses include different minimum durations in the routes

E.g. with 400 toPlaces I got a result with one different minimum duration and 10 routes with different route options:

grafik

  • I ran the test otp server with 100+ toPlaces and did not get this error. Maybe it has sth to do with the graph size?

I constructed the graph with the Brandenburg Geofabrik .pbf file: https://download.geofabrik.de/europe/germany/brandenburg.html
And the VBB GTFS file: https://www.vbb.de/unsere-themen/vbbdigital/api-entwicklerinfos/datensaetze

Hi @AlexandraKapp I download the data you linked to and I can't reproduce your problem

library(opentripplanner)
path_data = file.path("E:/OneDrive - University of Leeds/Data/opentripplanner/")
path_otp = otp_dl_jar(path_data)

log1 <- otp_build_graph(path_otp, path_data, router = "brandenburg", memory = 10000)
log2 <- otp_setup(path_otp, path_data, router = "brandenburg", memory = 10000)
otpcon = otp_connect(router = "brandenburg", timezone = "Europe/Berlin")

places <- c(13.14789, 52.52708,
            13.30719, 52.62389,
            13.30719, 52.62389,
            13.42186, 52.52875)
places <- matrix(places, byrow = TRUE, ncol = 2)


fromPlace <- places[rep(seq(1, nrow(places)), times = 400),]
toPlace <- places[rep(seq(1, nrow(places)), each = 400),]


r1 <- otp_plan(otpcon, fromPlace, toPlace, mode = c("WALK","TRANSIT"),
               date_time = as.POSIXct(strptime("2020-07-28 13:30", "%Y-%m-%d %H:%M")),
               ncores = 4)

r2 <- otp_plan(otpcon, fromPlace, toPlace, mode = c("WALK","TRANSIT"),
               date_time = as.POSIXct(strptime("2020-07-28 13:30", "%Y-%m-%d %H:%M")),
               ncores = 4)

identical(r1, r2) # TRUE

I think this is a specific problem with your data, not with the package. Does this example work for you? If so can you provide some exmaple places where is does not work?

Hi, thanks for all your effort!

I cannot really name single places where it doesnt work, as those always change.
Maybe it makes a difference if its all different places, instead of repeated places?
Maybe if you try the toPlaces I listed in the csv?
Its just the first 50 stops from the gtfs stops.txt file.

#66 (comment)

toList <- read_csv("stops_example.csv") %>% st_as_sf(coords = c("stop_lon", "stop_lat"))

I will also try to rebuild the graph

Hi @AlexandraKapp I can reproduce the error, but as far as I can tell it is a bug in OTP, not the package.

This worked by repeatedly asking for the same route

fromPlace <- matrix(c(13.869697, 51.455449), ncol = 2)
toPlace <- matrix(c(11.407520,53.635261), ncol = 2)

fromPlace <- fromPlace[rep(1, times = 500),]
toPlace <- toPlace[rep(1, times = 500),]

r3 <- otp_plan(otpcon, fromPlace, toPlace, mode = c("WALK","TRANSIT"),
               date_time = as.POSIXct(strptime("2020-07-28 13:30", "%Y-%m-%d %H:%M")),
               ncores = 4)

r4 <- otp_plan(otpcon, fromPlace, toPlace, mode = c("WALK","TRANSIT"),
               date_time = as.POSIXct(strptime("2020-07-28 13:30", "%Y-%m-%d %H:%M")),
               ncores = 4)
identical(r3, r4)

It might be due to the timeout options, meaning that sometimes OTP times out before finding the extra route and sometimes it doesn't you can adjust the timeouts with the config files

thanks!

I'm going to close this issue as I don't think it is a bug in the R package, but reopen if you find out otherwise