recurze / IndianCities

Dataset with population, location and area information of main cities of India

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

IndianCities

This repository shall contain a dataset with information on population, location and area of the main cities of India. The cities included in this dataset all have a population of 1Lakh (100k) or more. It also includes these information on all the districts and states/UTs of India.

Directory structure

All the initial data tables and intermediary csvs are stored in the data/ directory. Scripts used to scrape the web for information, and to clean and process the data are stored in the scripts/ directory. The root directory consists of final_*.csvs which hold population, location and area information of cities, districts or states. It also contains a IndianCities.py which is the main module to assist in usage of the data collected.

Python module

The main file IndianCities.py contains cities(), districts() and states() function which read the respective csvs to return a panda dataframe of the data stored in them.

Data Sources

  • The population data of the cities have been retreived from the Census of 2011 conducted by the Indian Government. The website also has it's own consolidated list of cites with a population of 1Lakh and above.

  • Latitude and Longitude of the cities were taken from https://latlong.net; for the cities missing from their database, information was filled in from Wikipedia.

  • Villageinfo was used to collect information on area of cities and towns, both small and large. Area for cities missing from there were filled in from Wikipedia again.

  • There are still cities for which finding these information has been hard. One could go through state's or the district's website to see if it can be found.

About

Dataset with population, location and area information of main cities of India


Languages

Language:Python 100.0%