Geographic coverage and scaling up analysis: Incomplete geographic extent of catchment area and wrong amTravelTimeCatchment value

Question

Geographic coverage and scaling up analysis: Incomplete geographic extent of catchment area and wrong amTravelTimeCatchment value

SteeveEbener opened this issue 2 years ago · comments

Current Behavior

When running the geographic coverage or scaling up analysis with a maximum travel time of 60 min and a maximum population coverage capacity above the total population in the study area, all the catchment area should present an amTravelTimeCatchment value of 60 and the geographic extent should correspond to such a travel time.

This is not the case. In the resulting MS Excel file (table_result_geographic_coverage_analysis.... or table_result_scaling_up_analysis_...) the value for the amTravelTimeCatchment field is:

59 minutes instead of 60 minutes if there is population in at least one cell at the border of the catchment area
Below 59 minutes if there is no population within the cells at the border of the catchment area

The geographic extent of the catchment areas in the shape_catchment_... shape file reach the equivalent of 59 minutes of travel time in case 1 (screenshot here below) and the equivalent of the travel time reported in the resulting MS Excel file in case 2 and not the equivalent of 60 minutes of travel time.

This is particularly obvious when you run an isotropic analysis with only one landcover type presenting a speed of 5 km per hour for example over a very sparse population distribution grid.

The following zip file contains an example of both cases based on an isotropic geographic coverage analysis (see screenshot below as well) but the same is happening with the scaling up analysis :
shape_catchment_GEO_VAC_SHORT.zip

Expected Behavior

The geographic extent of the catchment areas is based on the maximum travel time (t) and not (t-1) and this independently from having population located until their borders or not + the value for the amTravelTimeCatchment field in the corresponding table_result_geographic_coverage_analysis.... or table_result_scaling_up_analysis_.. does also match the (t) value indicating that the benchamrk has been fully implemented.

Steps to Reproduce

Run an isotropic analysis for a give travel time with a very high maximum travel time value on a sparse population distribution grid and see the results

Detailed Description

Version used: 5.7.21-alpha-1.1

F.Moser · Answer 1 · Mon Aug 22 2022 22:43:07 GMT+0800 (China Standard Time)

Is this a regression or was it implemented like this in previous versions and accepted as is ?

ref. #314 (comment)

Latest conceptual change on this was in 3e0be18, added in 5.7.8-alpha, Jul 23 09:33:48 2021

Then the previous conceptual change occurred in 5.6.56, latest VM stable, Apr 9 16:11:11 2021

I will have to look more in depth if it's independent from those changes

Steeve Ebener · Answer 2 · Tue Aug 23 2022 08:52:42 GMT+0800 (China Standard Time)

Good catch...and memory!

I run the same geographic coverage analysis enabling the "Run the analysis without considering capacities" option this time and, interestingly, the result is much closer to what is expected (red line):

The content of the "table_result_geographic_coverage_analysis...." Excel file is now mentioning 60 min for the amTravelTimeCatchment field (zip file does also include the catchement area extent):
shape_catchment_GEO_SHORT_NO_CAP.zip

...and the geographic extent is now closer to the 60 min raster grid you obtain when using the accessibility analysis for all the health facilities, but still not fully the same (still corresponds to 59 min of travel time):

=> it is like the adjustment that has been applied to the "Run the analysis without considering capacities" option has not been applied to running the geographic coverage and scaling up analysis when you don't enable this option, while this should have been the case.

The difference between the extent of the 60 min travel time (accessibility analysis) and 60 min catchment area (geographic coverage and scaling up analysis) is another important issue that also has to be fixed as this leads to difference in in population coverage. This is for example the case for 4 of the 8 health facilities I have been using here:

Comp_population_230822.xlsx

Steeve Ebener · Answer 3 · Thu Aug 25 2022 20:30:57 GMT+0800 (China Standard Time)

...and I just found out that running the accessibility analysis in anisotropic mode does also stop at 59 min instead of 60 min while when running it in the isotropic mode it reaches 60 min

Nicolas Ray · Answer 4 · Thu Aug 25 2022 22:02:38 GMT+0800 (China Standard Time)

Wait... this reminds me that we had refactored AM 1-2 year ago and this discussion of the 59-60min was at the center of it. this was when we changed the travel time output raster to hold integer numbers for the minutes (and not floating point anymore). It is likely that cells with a travel time of 59min are actually representing travel times ranging from 59 to 59.99999999 minutes. In this case this would therefore be correct, because we don't wont to have the 60min cells (representing 60-60.9999999 min of travels). Before we change anything @fxi, can you check about this in the code?

Steeve Ebener · Answer 5 · Fri Aug 26 2022 06:37:24 GMT+0800 (China Standard Time)

@nicolasray: There are several issues here then:

The way the integer value in the cell is obtained might not be correct, meaning that, from 59.5 minutes the value in the cell should be 60 and not 59
The isotropic mode does not follow the same rule as the anisotropic one as the former includes 60 minutes cells but not the later
Running the geographic coverage or scaling up analysis without enabling the "Run the analysis without considering capacities" option is not producing the expected geographic extent of the catchment areas (stopping it to the last cell containing population within the set maximum travel time)

F.Moser · Answer 6 · Thu Sep 22 2022 23:41:06 GMT+0800 (China Standard Time)

Isotropic and Anisotropic travel time analysis use the exact same function to convert seconds to minutes and round the results, and having an exact round value by chance is quite unlikely as we use seconds internally and a divider. If there is an issue, it's in the formula used :

cmd <- sprintf(
    " %1$s = %1$s >= %2$d && %1$s <= %3$d ? round((( %1$s / %6$f) - (( %1$s / %6$f ) %% 1))) : %1$s / %6$d > %4$d ? %5$s : null() ",
    map # 1
    , cutSecondsStart # 2
    , cutSecondsEnd # 3
    , timeoutMinutesLimit # 4
    , timeoutMinutesValue # 5
    , divider # 6
  )

Which translate in pseudo code – common case – as

   travel time = 
   IF 
   travel time >= 0 and travel time <= max time 
   THEN
   convert to minutes and remove decimal (floor), then, return an integer
   ELSE 
   null

The Catchment Analyst requires population to work, but can work regardless of capacity. If the catchment must represent the travel time itself, this could be coded as such. A new option could be added or we can modify the semantics of the current one.

Steeve Ebener · Answer 7 · Fri Sep 23 2022 06:24:01 GMT+0800 (China Standard Time)

Thanks Fred.

If the formula is the same then the problems might come from somewhere else as the difference in result between the isotropic and anisotropic is there.

I can't fully understand the formula reported in your message but does it add 0.5 before getting the integer value to ensure that from 59.5 the integer value is 60 and not 59?

F.Moser · Answer 8 · Fri Sep 23 2022 17:50:17 GMT+0800 (China Standard Time)

Ok.

I created a new version, fredmoser/accessmod:5.7.21-beta-1.3-ceil

Please test it and report back as soon as possible.

What should happen now :

59.1, 59.9, 60 -> 60
60.1, 60.9, 61 -> 61

This is in contradiction of what was decided during the integer issue, where floor was preferred. I can't remember the rationale, probably avoiding over optimistic models. This change will not be part of the next release unless it's also approved by @nicolasray

Note that .. all modules will be affected, including zonal stats.

What it does not handle :
- "Run the analysis without considering capacities" option, that still output catchments based on population. New option to use the full travel time as catchment should be discussed, probably in another issue.
- Any other issue than the rounding technique, which should be reported in other issues.

Thanks !

Nicolas Ray · Answer 9 · Fri Sep 23 2022 18:02:44 GMT+0800 (China Standard Time)

We need to keep the current version with the floor for now. We had agreed that this floor option gives catchment with travel times cells that are <60min (i.e. up to 59.999999min). I think this is is what we want. The implication of changing that now can indeed affect many other modules, but if reconsidering this is needed, we should have an exhaustive assessment of what are the results of catchments in all modules and analyses, described in a dedicated side document, and then make an informed final decision.

Steeve Ebener · Answer 10 · Fri Sep 23 2022 18:10:27 GMT+0800 (China Standard Time)

The problem with the use of the floor function is that a cell with a travel time of 59.6 min is given the value of 59 while it should be 60 min

This is what we should obtain when using the values in Fred's email:
59.1, -> 59
59.9, 60 -> 60
60.1 ->60
60.9, 61 -> 61

When the user specifies a maximum travel time of 60 min this includes 60 min. It is very confusing when you specify 60 min to end up having tables and raster that stops at 59 min.

I can't remember how we got to this misunderstanding but this should be addressed ASAP.

F.Moser · Answer 11 · Fri Sep 23 2022 18:28:11 GMT+0800 (China Standard Time)

Ok. This is a third way.. It uses the classic rounding technique, based on the central position between two integers.

59.5001 -> 60
59.5000 -> 60
59.4999 -> 59

We can also let the user choose : add a simple option to select the rounding technique :

No rounding ( heavy maps, floating point )
Classic rounding
Floor rounding ( recommended )
Ceil rounding

-> It's easy for the user to understand, and rounding is taught in primary school.
-> The description would mention why this is requested.

I think I've added such an option back then, to switch between integer / floating point map. I think it's important.

Steeve Ebener · Answer 12 · Fri Sep 23 2022 18:33:13 GMT+0800 (China Standard Time)

Thanks Fred. This could indeed solve part of the issue but would this also address the difference in catchment area extent between the accessibility analysis and geographic coverage/scaling up ones I included in the original ticket?:

F.Moser · Answer 13 · Fri Sep 23 2022 18:45:52 GMT+0800 (China Standard Time)

The version I just pushed did not produce the result you expected?

Steeve Ebener · Answer 14 · Fri Sep 23 2022 18:51:55 GMT+0800 (China Standard Time)

I did not tested it yet, will try to do that over the weekend. If I understand correctly this should then address the difference in catchment area extend between the different analysis. That's great if this is the case. I will let you know

F.Moser · Answer 15 · Fri Sep 23 2022 22:08:58 GMT+0800 (China Standard Time)

I was implementing the option to let the user chose the rounding approach – as an experiment – and I draw a simple graph to help visualize the effect in the documentation, if the option was included in the next release.

Doing that, I realized that either round nor floor were suboptimal choices... What we want, is ceiling

ceiling gives a good representation of entering the first minute of travel time as soon as the first second of cost is counted
round has an arbitrary 30 [s] delay before counting 1 minute as a valid isochrone step.
floor is off by 59 [s].

Depending on the map resolution, it could be problematic...

Moreover, other parts of the code use "less than or equal" for handling maximum travel time, which is seems to be the logical equivalent of ceiling… If we use anything else than that, I will have to modify what's the maximum travel time wherever it occurs in the code.

@SteeveEbener I tried to reproduce your example using the demo location – as I don't think you did share your project or config files. The new approach seems to solve the catchment issue. I'm waiting for your confirmation.

Steeve Ebener · Answer 16 · Sat Sep 24 2022 07:15:45 GMT+0800 (China Standard Time)

Thanks Fred, this definitively helps. Would you just mind giving an example of values we would get for travel time between 59.0 and 60.0 using the 3 approaches?

I created the folder with the dataset last August but for some unknown reason the URL did not end here...my apologies for that.

Here the link: https://www.dropbox.com/sh/harwjznp0ck0pus/AACHk-iSRpZ9xxsT-Er8x4tfa?dl=0

Steeve Ebener · Answer 17 · Sun Oct 02 2022 08:50:05 GMT+0800 (China Standard Time)

I performed the same analysis under 5.7.21-beta-1.3-ceil and here are the results:

The geographic extent of the catchment area are now corresponding to what is expected but is not clear which one of the 3 approached reported in your comment dated September 23
The travel time appearing in the analysis tables for the geographic coverage and scaling up are reaching 60 min as expected. Tables are attached.

table_result_geographic_coverage_analysis_GEO_Access_60m.xlsx

table_result_scaling_up_analysis_SCALING_UP.xlsx

This being said, it seems there is a problem with the amCapacityResidual, amPopCatchmentDiff, amPopCoveredPercent fields as the value is "0" for all the records while this should not be the case.

=> Could this be linked to the change that you have implemented?

F.Moser · Answer 18 · Wed Oct 19 2022 20:20:10 GMT+0800 (China Standard Time)

Ref to #400 (comment)

@SteeveEbener To my understanding, ceiling is the only correct approach and it's the one used in the version 5.7.21-beta-1.3-ceil, ( "ceil" suffix ). The missing values in table you mentioned will be fixed during my next maintenance session, before releasing a public version 5.7.21 ( non-beta).

You can refer to the graph and my description for that choice.

Now, for the question of a switch – as an advanced option – it would solve the question of reproducibility with previous versions. A backward compatibility option for all versions released from 2019 to now, since the uses of integer raster.

Original issue in which the cells types (integer vs float) were debated :

#207 (Oct 18, 2018)

The rounding option was chosen first:

078b72f -> using round ( Oct 31, 2018 )

Then the floor function has been preferred:

777ed79 -> converted to floor ( Nov 16, 2018 )

In #241 (comment), I mentioned again the rounding instead of the floor.

The ceiling solution matches the idea of the <= operator used for the catchment analysis, probably since the beginning.

Steeve Ebener · Answer 19 · Tue Oct 25 2022 07:21:53 GMT+0800 (China Standard Time)

Tanks Fred.

I looked again at the graph and I think I finally got it (sorry for that).

In view of this, I am not sure to understand why the ceiling option is the one we want instead of rounding as the latter is the closest option to the "straight line" and presents therefore the smallest difference within each minute of travel.

Which approach was implemented until version 4 of AccessMod?

Nicolas Ray · Answer 20 · Fri Jul 21 2023 20:30:08 GMT+0800 (China Standard Time)

@fxi : if you answer the final question in the previous comment, I guess we can then close the issue.

Nicolas Ray · Answer 21 · Wed Aug 23 2023 21:30:02 GMT+0800 (China Standard Time)

Up to version 4 of AccessMod, it seems the tool was using minutes in these internal calculations, not seconds. So there was no rounding issues at the time. Closing this issue, but you can reopen it if not satisfactory. [edit @fxi : minutes/seconds ]

Steeve Ebener · Answer 22 · Tue Jan 09 2024 20:53:28 GMT+0800 (China Standard Time)

Finally having the time to look into this.

@nicolas: thanks for your answer to my last question.

This still does nevertheless not explain why the ceiling is the best option as the rounding provides the result that is the closest to the "straight line" in Fred's graph dated Sep 23, 2022

F.Moser · Answer 23 · Tue Jan 09 2024 23:24:02 GMT+0800 (China Standard Time)

@SteeveEbener The rounding doesn't follow a straight line as much as you might think: it alternates between overreporting and underreporting time.

Using floor and round methods for converting travel time can lead to underreporting of the predicted duration. The ceiling method is more conservative in this respect. With flooring, up to 59 seconds could potentially be underreported, whereas rounding could result in a maximum underreporting of 30 seconds. Ceiling aligns with the principle that even a brief journey, such as stepping out of your house and walking for 28 seconds, should be reported as 1 minute of travel, not 0. This approach ensures a more accurate and cautious representation of travel times.

Steeve Ebener · Answer 24 · Wed Jan 10 2024 09:31:12 GMT+0800 (China Standard Time)

Thanks @fxi .

What you are mentioning for the flooring applies directly to the ceiling (with ceiling, up to 59 seconds could be potentially overreported).

The most conservative, accurate and cautious approach (to use your terms) is rounding because, in this case, the maximum over or under-reporting is 29 sec. This does indeed not create a straight line but represents the closest "path" to the straight line as per your schema and my previous message.