oliviermirat / MyAIGuide

Creating AI-based health coaches through crowdsourced health research

Home Page:https://myaiguide.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Normalization of same variable recorded from several different sources/trackers

oliviermirat opened this issue · comments

Look over this script: https://github.com/oliviermirat/MyAIGuide/blob/master/scripts/4_plotDifferentSteps.py as well as the scripts that it is using.
I wrote those scripts a long time ago and without really thoroughly testing them, so there might be some errors in them. The method and the code could probably both be improved as well.
Please look over this and report and fix any errors that you might find. Also, feel free to propose a better way of doing this normalization across different sources of data and implement it, if necessary. We will most likely use this script for both Participants 1 and 2 at least, for number of steps, denivelation, and maybe more.

Hi @oliviermirat the script uses the data "../data/preprocessed/preprocessedDataParticipant1.txt" and this is not available in the repo. Is this data the preprocessed data using our functions?

Hi @evadatinez: sorry, I forgot to specify that, but you need to first launch the script 1_createDataFrame.py to generate this preprocessed data.

perfect, thanks!

I have a question regarding rows 24-26:

# Removing "steps" caused by scooter riding    
data["steps"] = data["steps"] - 37 * data["scooterRiding"]    
data["steps"][data["steps"] < 0] = 0

how did you come up with the factor 37?

So as you might have guessed riding a scooter would record false positive steps on my Fitbit. So I started to estimate the number of minutes I spent riding a scooter every day.
Then on a few occasions, I kept track of the number of falsely recorded steps on a given amount of time, and that's how I then estimated the factor 37. This is an approximation though.