Refactor collecting, preprocessing, and curvature calculation.

Question

Refactor collecting, preprocessing, and curvature calculation.

adamfranco opened this issue 8 years ago · comments

As described in #10, refactoring the initial way collection, pre-filtering, segment creation, and curvature calculation into discrete steps will add flexibility and potentially allow optimizations to each independently. This issue is an umbrella issue for this whole refactoring of the curvature-calculate program. Each processing stage will have its own independent issue.

To successfully refactor this process, we'll need to retain more (maybe all) of the original way-data throughout the processing stages so that we know what points are associated with which original way. If the first joining step actually combines the ways into a single series of segments (line) we wouldn't easily be able to go back and associate tags with the constituent segments. Instead, the first PBF read/joining stage should create a new, richer data-structure where each item is an ordered list of the "joined" ways with all of their data.

Collect and "join" - sub-issue #21
This first stage would pretty much the raw OSM data, but with the coordinates in-lined and the joined ways ordered in a collection that can be examined as a unit. The highway types to include would probably be the main key for avoiding including buildings/boundaries/sidewalks/hiking-paths/etc. Here's an example of what this data might look like:

{ # Overall collection/stream of items, order doesn't matter.
  [ # One item, an ordered sequence of joined ways where the last
    # node/point of each way is the same as the first node/point of the next.
    {
      'id': 19730334,
      'tags': {
        'name': 'Northwood Drive',
        'surface': 'paved',
        'type': 'residential',
        # ... other tags on the way ...
        'county': 'Washington, VT',
      },
      'coords': [
        {
          'id': 1000,
          'lat': -72.29485699999982,
          'lon': 43.78561799999995,
        },
        # ...,
        {
          'id': 1020,
          'lat': -72.2948550000000,
          'lon': 43.78561800000000,
        },

      ]
    },
    {
      'id': 19730336,
      'tags': {
        'name': 'Northwood Drive',
        'surface': 'unknown',
        'type': 'residential',
        # ... other tags on the way ...
        'county': 'Washington, VT',
      },
      'coords': [
        {
          'id': 1020,
          'lat': -72.2948550000000,
          'lon': 43.78561800000000,
        },
        {
          'id': 1021,
          'lat': -72.2948550000011,
          'lon': 43.78561800000012,
        },
        # ...
      ]
    },
    # ...
  ],
  # ...
}

Pre-filter - sub-issue #22
An optional second "pre-filter" stage could take in one collection of joined ways and break these collections apart based on tags, such as surface, potentially dropping some altogether. The result would be the same data-structure as above, with all of the data for each way available. This wouldn't be needed for users who want to keep all ways in their data set (the filtering could be done later for particular outputs), but it could serve to lower the data-processing costs for users who are only interested in a sub-set of the data.
Segment creation - sub-issue #23
Convert the coords for each of the ways into an ordered list of "segments". Since the ways in a collection will share their first/last points, we don't need two worry about a "segment" between each of the ways in the collection. This new data-structure would drop the coords from items and add segments
Curvature decoration on segments - sub-issue #24
Optionally, decorate the segments with curvature. This isn't needed for my "surfaces" output because its not used.
Splitting - sub-issue #25
Optionally, split ways and their containing collections on straight-segment thresholds or other criteria that depends on curvature.
Additional Post processing - sub-issue #26
Pass off to post-processing & output. The data format passed off to the post-processing scripts will be different that we are currently using, but richer since the segments will still be associated with their original way-data and will retain their original OSMIDs. This will make it much easier to add "Edit" links in the output data to help encourage data-cleanup in OSM. This should also make it easier to change styling based on the tags of the way associated with segments.

There's a lot to do here to implement this, but hopefully each stage is well defined enough that we should be able to tackle it reasonably. The first stage of collating ways and joining them into first-stage collections will probably be the trickiest, but at least its simpler than the current collector. 😉

Erik Fonselius · Answer 1 · Thu May 12 2016 23:16:32 GMT+0800 (China Standard Time)

I think we should try to keep these changes as small as possible in order to release a stable version and then do iterative improvements on specific parts

Adam Franco · Answer 2 · Thu May 12 2016 23:42:45 GMT+0800 (China Standard Time)

I agree. I'm trying to keep the algorithmic changes to a bare minimum while breaking it apart. I hope that I'll have this first draft done by tomorrow. After that is done hopefully we can both validate it against our individual use cases. I'll merge into refactor after that.

Once we have validated this (and hopefully have some first-stage unit tests) we can take on bigger changes like #19 Memory footprint optimization and other cleanup as follow-up work.

Adam Franco · Answer 3 · Sat May 14 2016 10:44:33 GMT+0800 (China Standard Time)

I've now reorganzied the top-level contents into bin/ and lib/ as appropriate and finished with most of my testing. Updating the README is the only main remaining task before I merge this into refactor.

@Fonsan I'll be busy tomorrow (Saturday), but could you take a quick look and let me know if you'd like to do any more testing against your pipeline before I move forward with this merge?

Erik Fonselius · Answer 4 · Sat May 14 2016 13:33:04 GMT+0800 (China Standard Time)

I will rebase my pipe and let you know if I find any problems

Adam Franco · Answer 5 · Wed May 18 2016 13:14:45 GMT+0800 (China Standard Time)

README updated. I'm considering this effort complete now.