1117 Russian cities with geographic coordinates, identifiers and 2020 population estimate.
from pathlib import Path
import requests
import pandas as pd
url = ("https://raw.githubusercontent.com/"
"epogrebnyak/ru-cities/main/assets/towns.csv")
# save file locally
p = Path("towns.csv")
if not p.exists():
content = requests.get(url).text
p.write_text(content, encoding="utf-8")
# read as dataframe
df = pd.read_csv("towns.csv")
print(df.sample(5))
- towns.csv - city information
- regions.csv - list of Russian Federation regions
- alt_city_names.json - alternative city names
Basic info:
city
- city name (several cities have alternative names marked inalt_city_names.json
)population
- city population, thousand people, Rosstat estimate as of 1.1.2020lat,lon
- city geographic coordinates
Region:
region_name
- subnational region name: oblast, republic, krai or one AO (Chukotka)region_name_ao
- autonomous okrug (AO) name, if AO is a part of larger regions (applies to 3 AO)region_iso_code
- ISO 3166 code, egRU-VLD
federal_district
, egЦентральный
City codes:
okato
oktmo
fias_id
kladr_id
- City list and city population collected from Rosstat publication Регионы России. Основные социально-экономические показатели городов and parsed from publication Microsoft Word files.
- City list corresponds to this Wikipedia article.
- Alternative dataset is wiki-based Dadata city dataset (no population data).
There are four autonomous regions (AO) in Russia:
- Ненецкий автономный округ
- Ханты-Мансийский автономный округ - Югра
- Чукотский автономный округ
- Ямало-Ненецкий автономный округ
Ханты-Мансийский
and Ямало-Ненецкий
(AO) are inner parts of Тюменская область
.
Ненецкий
autonomous regions (AO) is inner part of Архангельская область
.
AO names above are listed in region_name_ao
for three AO.
Чукотский
AO is a stand-alone region, it is not an inner part of any region.
Чукотский автономный округ
is listed in region_name
only.
- Several notable towns are classified as administrative part of larger cities (
Сестрорецк
is a municpality at Saint-Petersburg,Щербинка
is a part of larger Moscow). They are not reported in this dataset.
Белоозерский
not found in Rosstat publication, but should be considered a city as of January 1, 2020. We included it into dataset.
Дмитриев
andДмитриев-Льговский
are the same city.- We suppressed letter "ё"
city
columns in towns.csv - we haveОрел
, but notОрёл
. This affected, for example:Белоозёрский
Королёв
Ликино-Дулёво
Озёры
Щёлково
Орёл
assets/alt_city_names.json
contains the alternative name pairing.
poetry install
poetry run python -m pytest
Run:
- download data from Rosstat using rar/get.sh
- convert
Саратовская область.doc
to docx - run make.py
Creates:
_towns.csv
assets/regions.csv
Note: do not attempt this stage if you do not have to - these scripts take a while and use third-party API access. You have the resulting files in repo, so probably you can skip running these scripts.
Run:
cd geocoding
- run coord_dadata.py (needs token)
- run coord_osm.py
Creates:
- geocoding/coord_dadata.csv
- geocoding/coord_osm.csv
Run:
- run merge.py
Creates:
- assets/towns.csv