amf60 / Module_5_ABADSII_The_road_not_taken

Module 5. First, I make a critical review of the Kaggle's dataset proposed for this ABADSII’s module. Next I identify a more suitable repository using the same population pool and derive a new dataset respecting the same time window used by the Kaggle's one. Finally, I benchmark the two datasets and suggest the last as the most potentially appropriate for the development of a predictive machine learning model to identify recent hospitalised covid-19 patients at risk of aggravation needing ICU.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

ALURA_M5projeto-_The_road_not_taken

This is Aldir MEDEIROS FILHO's project deliver to Module 5 of Alura Bootcamp APPLIED Data Sciences (Winter 2021)

It is composed of 3 Colab notebooks:

  • M5_Part_I_Kaggle_sl_k_dataset_AMF_cleaning.ipynb: my critical review of dataset made available in Kaggle.

  • M5_Part_II_C19_FAPESP_DataSharing_BR_HSL_short_Object_names.ipynb : deriving a new dataset from a different repository to match Kaggle's one.

  • M5_Part_III_Benchmarking_Kaggle_vs_FAPESP_Data_Sharing_BR.ipynb : benchamarking the two datasets from different sources and conclusions.

About

Module 5. First, I make a critical review of the Kaggle's dataset proposed for this ABADSII’s module. Next I identify a more suitable repository using the same population pool and derive a new dataset respecting the same time window used by the Kaggle's one. Finally, I benchmark the two datasets and suggest the last as the most potentially appropriate for the development of a predictive machine learning model to identify recent hospitalised covid-19 patients at risk of aggravation needing ICU.


Languages

Language:Jupyter Notebook 100.0%