The project, in collaboration with UCSD Health and Samsung, aimed to realize proactive, continuous and personalized virtual healthcare utilizing health and fitness data collected from Samsung Galaxy Watch. We work to propose machine learning based healthcare analytics and maximize automated and continuous collection of user data like calories, exercises, sleep, heart rate, step count, contextual data and etc.
Photoplethysmogram (PPG) signals collected by Galaxy Watch were also utilized to estimate continuous blood pressure. Using the proposed feature engineering and selection techniques, we worked to address the limited, noisy, unaligned and irregularly sampled data collected from various sources.
For each chronic condition addressed, like blood pressure and diabetes, we worked to develop personalized prediction and feature importance models based on the collected data and the proposed ML techniques. Not only would the accuracte prediction on chronic conditions indicator based on past health behaviors and context be tailored for the users, but also the most important features influencing the tendency of blood pressure would be given for reference.
1. numpy 2. pandas 3. matplotlib 4. requests 5. json 6. authlib 7. oauthlib 8. base64 9. csv 10. seaborn 11. sklearn 12. pickle
DataRequestAndParsing: for requesting and parsing data
1. read_BP.ipynb: request and parse blood pressure data from Omron 2. read_samsung_data.py: request and parse health and fitness data from Samsung 3. new_data_merge.py: combine health and fitness features into one data frame DataUpdate: for regular data maintenance
1. refresh_omron_token.py: refresh tokens for accessing blood pressure data from Omron 2. refresh_samsung_token.py: refresh tokens for accessing health and fitness data collected by Samsung Galaxy Watch 3. update_info.py: create csv file with updated health and fitness data DataVisualization: for data visualization and exploration of correlation
1. visualize_merged_df.py: visualize different features and their underlying relationships 2. visualize_merged_df.ipynb: for running the script with user index specified 3. Results: folder of data visualizationr results ModelTraining: for training model to fit blood pressure, health and fitness data
1. aggregate_24h.py: aggregate data in past 24 hours based on the timestamp of each record of blood pressure measurement 2. build_models.py: extract additional features, convert time form, interpolate data and prepare training model 3. slp_duration.py: for processing sleep data and computing daily sleep duration 4. slp_processing.py: for preprocessing sleep data
- Omron Blood Pressure Data
The mobile application Omron Connects helps to transfer the data from the BP monitors to the Omron Cloud Service. We could either download the data from Omron Cloud Service to our server or directly request the data from Omron Connect by OAuth authentication. - Samsung Health Data
For the access of data, the Samsung Health mobile application is connected to Samsung Cloud Service, from which we are able to download all related data
Data collection procedure can be briefly summarized in the following chart.
There are some examples demonstrated below.
- Heart Rate Daily and Weekly Pattern & Heart Rate Stacked Plot vs Blood Pressure
- Daily Sleep Stage & Daily Sleep Duration vs Blood Pressure
- Daily Step Count & Daily Step Count vs Blood Pressure
- Exercise Event Visualization
- Daily Proporrion of Sedentary Time vs Blood Pressure