dereksgithub / cities_study_QM_Term_project

Clustering analysis of US mid-sized cities

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Health Outcomes and Places of Residence

This project aim to seek out the relationships of selected health outcomes, risky behaviours and preventive measures.

Further development of this project:

  1. This will become my side hobby project from now on;
  2. The end goal is to build a public health predictor for cities, based on the open-source data collected, that has the capability of predicting health outcomes of urban/rural populations, providing policy suggestions for interventions;
  3. LLM may be integrated in the very future, for now the focus is to gather causal relations and data;
  4. Actual engineering and deployment is not a focus for now;
  5. Using high-definition remote sensing data to monitor the air quality/urban green space/outdoor activities index/outdoor exercising rate etc. combine the data from this pipeline back to the general forecasting model.

The author built regression models to study the and clustering models to classify US counties

The original dataset is pulled from urban institute's mid-sized cities data. While the original dataset offers plenty of interesting insights and sufficient features, it is not fine enough to practice complex models required by the assessment, thus the author incorporated CDC PLACE data, to enrich the original dataset.

The first phase is to solely study the data with quantative measurs, due to the alignment with the course scope, no spatial analysis is done. (Only visualizations)

The second is post module phase, this will introduce analysis with spatial factors in mind, i.e. performing spatial clustering, merging other datasets into the process, (including demographical data, key industrial/retail data) etc.

The current obejctive of this project is to identify more data sources to merge to the stury and eventually, build a viable model to model the change in the urban built environment, or population health conditons and answer questions such as: "How does building a new park for exercising, jogging, and meanwhile promoting policies to reduce the smoming and sugar consumption among the population, impact the level of heart disease prevalence in the population. "

Future works:

Considering Chronological data, Adding more spatial context. TBI Relate brain injuries data to the scope. Search for sugar consumption data, smoking ad spending data with geographical information combined.

This repo is developed for research and academic purposes only.

The code is borrowed and based mainly from the CASA0007 Practicals.

For any cpoyright concerns, please contact me via github.

About

Clustering analysis of US mid-sized cities


Languages

Language:Jupyter Notebook 100.0%Language:Python 0.0%