Navidfoumani / ConvTran

This is a PyTorch implementation of ConvTran

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

There are some problems in Ford StayAlert dataset related doc and code

linjianfeng opened this issue · comments

I downloaded Ford StayAlert challenge data according to https://github.com/Navidfoumani/ConvTran/blob/main/Dataset/Segmentation/Segmentation.Txt. The test csv file looks like
image

There are some problems related to this dataset:

  1. the label column named 'IsAlert' is filled with '?', we cannot test with it due to missing real label
  2. the function load_ford_data in data_loader.py fails due to it try to access non-existed column name 'series' and 'label' (should be 'TrialID' and 'IsAlert'?)
  3. the following code in Dataset/load_segment_data.py try to rearrange data matrix from (sample, window_len, channel) to (sample, channel, window_len) with Numpy reshape method, I think this is wrong because reshape is just to simply re-segment the items, not to transpose the matrix. So that vectors in Data['train_data'] are ill aligned.

Data['train_data'] = X_train.reshape(X_train.shape[0], X_train.shape[2], X_train.shape[1])

So do you have another version of Ford dataset? And if the algorithm got good score with the ill aligned dataset, maybe it could achieve better performance with rectified code?

I apologize for the problems you encountered with the Ford StayAlert dataset documentation and code. I appreciate you bringing this issue to our attention. I have made the necessary updates to the code, and I kindly request that you re-download it (or please replace the existing util.py file in your project with the updated version)

Ford dataset:
Access the dataset from the following Kaggle competition link: https://www.kaggle.com/competitions/stayalert/data.
Download the "stayalert.zip", which contains the following files:
Solution.csv
fordTrain.csv
fordTest.csv

Labeling the test data:
The file fordTest.csv does not have labels. To assign labels to the test data, follow these steps:
Open the Solution.csv file.
Copy the contents of the prediction columns.
Paste the copied prediction values into the "ISAlert" column of the fordTest.csv file.
Renaming and copying files:

Rename the fordTrain.csv and fordTest.csv files to FordChallenge_Train.csv and FordChallenge_Test.csv, respectively.
Copy the FordChallenge_Train.csv and FordChallenge_Test.csv files to the following directory: Datasets/Segmentation/FordChallenge.
Column renaming:

Open the FordChallenge_Train.csv and FordChallenge_Test.csv files.
Rename the following columns:
"TrialID" to "series"
"obsNum" to "timestamp"
"IsAlert" to "label"

Finally: Copy the FordChallenge_TEST.csv and FordChallenge_Train.csv to: Datasets/Segmentation/FordChallenge