Don't reverse geocode every museum on every commit
simonw opened this issue · comments
The reverse geocoding annotation script currently runs against every single museum on every commit, which is inefficient.
Instead, it should download the previous version of the database from https://www.niche-museums.com/browse.db and only run against the records that have not yet had their various osm_ columns populated.
This depends on simonw/sqlite-utils#66 so I can use a corrected version of upsert.
This is called out in their terms of use, which recommend caching to avoid sending same request multiple times: https://operations.osmfoundation.org/policies/nominatim/
Easiest thing here would be to add a caching table, rather than messing around with upsert.
I also need to send a custom user-agent string.
I can cache based on a geohash.