Question about dataset UT-HAR

Question

Question about dataset UT-HAR

infinite0522 opened this issue 2 years ago · comments

As I know, the CSI samples in UT-HAR dataset are with the timestamp of 2000 and the size of (3, 30, 2000). The number of samples is 557. How do you process the CSI samples to obtain the size of (3, 30, 250) each and get the total number of about 5000?

Meng · Answer 1 · Sun Sep 18 2022 20:35:06 GMT+0800 (China Standard Time)

And as it is mention in youe paper, "Following existing works [53], the data is segmented using a sliding window, inevitably causing many repeated data among samples", I can not find the propcess of segmenting data by a sliding window whether in the paper nor the code of [53]. Could you please explain this process in detail?

Xinyan Chen · Answer 2 · Tue Sep 20 2022 15:40:54 GMT+0800 (China Standard Time)

UT-HAR dataset use sliding window method with a window size of 1000 and a step size of 200 to generate data. Then they down sampling the (1000,90) data by a sampling rate of 2 in time domain to get a size of (500,90).
The code of above process can be found in this link: https://github.com/ermongroup/Wifi_Activity_Recognition
After that, we take average pooling of each two adjacent timestamp to get the the size of (3,30,250).

In raw data of UT-HAR dataset, there are only 557 samples. In order to generate more data, they use sliding window method to segment original samples. However, this process cause repeated data among samples because the step size of 200 < the window size of 1000. Thus, we recommond you to use our NTU-Fi dataset which has no overlapping data among samples.