input data

Question

input data

ceciljc opened this issue 6 years ago · comments

Hi, I want to learn about DeepSense using my own gait data collection but for gait recognition purpose not for activity recognition, recorded with 50Hz in a csv files, with 150 participants, the format is timestamp followed by 3 axis acc and 3 axis gyro. How I should preprocess the data to become like your eval or train dataset?

because my sampling rate is fix to get 50 samples per seconds, should I still perform the time interpolation?

I would really appreciate it if you have some explanation about your dataset. Thank you

Shuochao Yao · Answer 1 · Wed Jun 13 2018 04:20:56 GMT+0800 (China Standard Time)

Hi, you can refer to the pre-processing code.

ceciljc · Answer 2 · Mon Jun 18 2018 11:47:07 GMT+0800 (China Standard Time)

Hi,

After trying to interpret the code from sep_HHAR_data.py and I saw the previous comment too, I know that 120 values of training data is come from the result of FFT from interpolation, but I dunno about eval, in the code you select a for one user out, and can generate 200 eval file, can you explain for this part?

also if my sampling rate already set fix no matter what the device that I use to collect the data, should I still perform the interpolation for preprocessing?

I really hope you will reply to this question, just ignore the previous one. Thank you

Shuochao Yao · Answer 3 · Tue Jun 19 2018 01:47:04 GMT+0800 (China Standard Time)

Hi,

Data measured with different people holding different devices. In the "one_user_out" mode, the code just pick the data from one particular user as the eval data with the left as the training.

For your own data, I think interpolation is still needed. Even if you fix your sampling rate, the timestampe attached on each measurement may not follow the exact sampling rate (due to some background workload on the device). Interpolation can help to get "uniformly sampled" data used for FFT.

ceciljc · Answer 4 · Tue Jun 19 2018 11:00:15 GMT+0800 (China Standard Time)

so from the selected user a, it will generate 200 eval files including the null file too will be processed? what is exactly happen in this code:

`for idx in range(len(X)):
curLen = len(X[idx])
maskX.append([[1.0]]*curLen)
for addIdx in range(wide - curLen):
X[idx].append([paddingVal]*inputFeature)
maskX[idx].append([0.0])

for idx in range(len(evalX)):
curLen = len(evalX[idx])
evalMaskX.append([[1.0]]*curLen)
for addIdx in range(wide - curLen):
evalX[idx].append([paddingVal]*inputFeature)
evalMaskX[idx].append([0.0])`

Shuochao Yao · Answer 5 · Wed Jun 20 2018 22:20:44 GMT+0800 (China Standard Time)

Input data need to have the same shape. Some data samples are short, so we need to pad these samples with zero at the end. And we will mark these zeros during training.

ceciljc · Answer 6 · Thu Jun 21 2018 10:10:39 GMT+0800 (China Standard Time)

Thank you for replying, one more thing from data preprocessing:

so from user a you have 28 files i.e a-gear_1-bike, a-gear_1-null, and so on, I still can't understand why from 28 files can generate 200 eval files, what is the idea of this preprocessing part? and also do you include file for example a-gear_1-null, a-gear_2-null, etc too to eval?

I haven't touched the DeepSense Framework yet because I want to understand how you preprocess the data first, currently I am studying about CNN from coursera, I hope later on if I have questions regarding to the DeepSense framework, you won't mind.

wanghaokevin · Answer 7 · Tue Nov 20 2018 17:01:34 GMT+0800 (China Standard Time)

Hi, I want to learn about DeepSense using my own gait data collection but for gait recognition purpose not for activity recognition, recorded with 50Hz in a csv files, with 150 participants, the format is timestamp followed by 3 axis acc and 3 axis gyro. How I should preprocess the data to become like your eval or train dataset?

because my sampling rate is fix to get 50 samples per seconds, should I still perform the time interpolation?

I would really appreciate it if you have some explanation about your dataset. Thank you

I have a similar dataset collected at 100Hz, Can you tell me how you preprocess your data?

YeZhang · Answer 8 · Mon Dec 07 2020 15:58:21 GMT+0800 (China Standard Time)

Hi,

Data measured with different people holding different devices. In the "one_user_out" mode, the code just pick the data from one particular user as the eval data with the left as the training.

For your own data, I think interpolation is still needed. Even if you fix your sampling rate, the timestampe attached on each measurement may not follow the exact sampling rate (due to some background workload on the device). Interpolation can help to get "uniformly sampled" data used for FFT.

Hi,
Sorry to disturb. Recently I downloaded Heterogeneity Activity Recognition Data Set from UCI Repository (link:https://archive.ics.uci.edu/ml/datasets/Heterogeneity+Activity+Recognition). When I tried to analyse the data, some questions confused me. First, for the phone dataset, why there exists the accelerometer data of Samsung Galaxy S+ (Samsungold) in Phones_accelerometer.csv, but there is no gyroscope data of Samsung Galaxy S+ (Samsungold) in Phones_gyroscope.csv? Second, why does the timestamp numbers of Phones_accelerometer.csv and Phones_gyroscope.csv have a large gap (the timestamps of Phones_gyroscope.csv is hundreds of thousands more than Phones_accelerometer.csv)? Last but not least, why the timestamps of nexus4-1-user-a is not continuous (e.g:suddenly increasing tens of thousands of seconds in 24675th row of Phones_gyroscope.csv)?

                                                                                                                                Looking forward for your reply!

                                                                                                                                                    Best wishes!