soyhyoj / wids-datathon-2021

Women in Data Science(WiDS) Datathon 2021

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Women in Data Science(WiDS) Datathon 2021

Overview

The WiDS Datathon 2021 focuses on patient health, with an emphasis on the chronic condition of diabetes. Intensive Care Units (ICUs) often lack verified medical histories for incoming patients. A patient in distress or a patient who is brought in confused or unresponsive may not be able to provide information about chronic conditions such as heart disease, injuries, or diabetes. Medical records may take days to transfer, especially for a patient from another medical provider or system.

Objective

The WiDS Datathon 2021 will focus on models to determine whether a patient admitted to an ICU has been diagnosed with a particular type of diabetes, Diabetes Mellitus. Using data from the first 24 hours of intensive care, participants will explore labeled training data for model development.

Data description

  • TrainingWiDS2021.csv - the training data. You should see 130,157 encounters represented here. Please view the Data Dictionary file for more information about the columns.
  • UnlabeledWiDS2021.csv - the unlabeled data (data without diabetes_mellitus provided). You are being asked to predict the diabetes_mellitus variable for these encounters.
  • SampleSubmissionWiDS2021.csv - a sample submission file in the correct format.
  • SolutionTemplateWiDS2021.csv - a list of all the rows (and encounters) that should be in your submissions.
  • DataDictionaryWiDS2021.csv - supplemental information about the data.
  • Files can be downloaded from the kaggle website.

Contents of this notebook

  • EDA + Binary classification (model used: LightGBM)
  • Some of the feature engineering was inspired by other participants' notebooks published here.

Final score at the leaderboard: 0.86744 (Top 22%)

About

Women in Data Science(WiDS) Datathon 2021


Languages

Language:Jupyter Notebook 100.0%