Quartz / bad-data-guide

An exhaustive reference to problems seen in real-world data along with suggestions on how to resolve them.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

geographic data?

smnorris opened this issue · comments

Many geo data issues are already covered in in other sections - especially the entered by humans part - but there are some common quirks that might be worth mentioning?

  • lon/lat vs lat/lon
  • inconsistent or incorrect CRS
  • inconsistent values used to indicate NULLs

A gotcha for me with was different names used to refer to same city (Beijing vs Peking), or different ways to name the same country (China vs People's Republic of China), I end up needing to create new columns and bind the varied names to standard country codes before I can join them.

What does CRS stand for? @smnorris

sorry - CRS / SRS / Projection.. many names for basically the same thing.

basic - http://mapschool.io/#projection
more - https://en.wikipedia.org/wiki/Geographic_coordinate_system

Most common tends to be the mystery CRS, where it isn't specified in the data/metadata and the user has to guess which one to use. Really fun is when a dataset has some records in one CRS, some in another... and the lucky user gets to figure out which is which.

+1 to this.

I'd also add - geometry has already been simplified. A fantastic demo is at https://www.jasondavies.com/simplify/