Normalization of same variable recorded from several different sources/trackers

Question

Normalization of same variable recorded from several different sources/trackers

oliviermirat opened this issue 4 years ago · comments

Look over this script: https://github.com/oliviermirat/MyAIGuide/blob/master/scripts/4_plotDifferentSteps.py as well as the scripts that it is using.
I wrote those scripts a long time ago and without really thoroughly testing them, so there might be some errors in them. The method and the code could probably both be improved as well.
Please look over this and report and fix any errors that you might find. Also, feel free to propose a better way of doing this normalization across different sources of data and implement it, if necessary. We will most likely use this script for both Participants 1 and 2 at least, for number of steps, denivelation, and maybe more.

Eva Martínez · Answer 1 · Sun Jun 28 2020 23:06:21 GMT+0800 (China Standard Time)

Hi @oliviermirat the script uses the data "../data/preprocessed/preprocessedDataParticipant1.txt" and this is not available in the repo. Is this data the preprocessed data using our functions?

Olivier Mirat · Answer 2 · Sun Jun 28 2020 23:12:38 GMT+0800 (China Standard Time)

Hi @evadatinez: sorry, I forgot to specify that, but you need to first launch the script 1_createDataFrame.py to generate this preprocessed data.

Eva Martínez · Answer 3 · Sun Jun 28 2020 23:13:27 GMT+0800 (China Standard Time)

perfect, thanks!

Eva Martínez · Answer 4 · Mon Jun 29 2020 03:36:47 GMT+0800 (China Standard Time)

I have a question regarding rows 24-26:

# Removing "steps" caused by scooter riding    
data["steps"] = data["steps"] - 37 * data["scooterRiding"]    
data["steps"][data["steps"] < 0] = 0

how did you come up with the factor 37?

Olivier Mirat · Answer 5 · Mon Jun 29 2020 04:33:06 GMT+0800 (China Standard Time)

So as you might have guessed riding a scooter would record false positive steps on my Fitbit. So I started to estimate the number of minutes I spent riding a scooter every day.
Then on a few occasions, I kept track of the number of falsely recorded steps on a given amount of time, and that's how I then estimated the factor 37. This is an approximation though.