martj42 / international_results

Home Page:https://www.kaggle.com/martj42/international-football-results-from-1872-to-2017

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Purpose of the city and country columns

martj42 opened this issue · comments

Need to think through what these columns are for. If for easiness of plotting maps, country names should be changed to some kind of a standard format for easy joining with coordinate databases. FIFA names aren't probably it e.g. London is in the UK not England.

If plotting is the main purpose then historic names should also be removed. Belgrade being in Yugoslavia, then Serbia and Montenegro and now Serbia is true and a cool bit of trivia for us history buffs but does it help or hinder anyone using the dataset?

Not sure if that question has been answered, here my thoughts on that:
London is in England and also UK. We have 2 political instances designations for the same place, which I do not know if happen in any other place besides UK. Given this is a football dataset and there is no UK team in the football (only England, Scotland, Northern Ireland and Wales) then I believe country should be kept as England, and more generally to a geographical/political designation where the match happened.

It is my understanding (I may be wrong) that geo databases dealing with historical databases in a very specific context (football) should handle these variations for non existing countries, for aggregated ones (UK and its components) as well for non existing countries, e.g., Yugoslavia. For countries/territories/geo area in the past that have just renamed, e.g. British Guyana to Guyana, I believe the country should also be kept in its original place, i.e. for above example British Guyana, meaning country should record the name of the place the match happened originally

Geo databases can be tweaked to handle historical names of the places or some middle way proxy translation can be used.