There are some problems in Ford StayAlert dataset related doc and code

Question

There are some problems in Ford StayAlert dataset related doc and code

linjianfeng opened this issue a year ago · comments

I downloaded Ford StayAlert challenge data according to https://github.com/Navidfoumani/ConvTran/blob/main/Dataset/Segmentation/Segmentation.Txt. The test csv file looks like

There are some problems related to this dataset:

the label column named 'IsAlert' is filled with '?', we cannot test with it due to missing real label
the function load_ford_data in data_loader.py fails due to it try to access non-existed column name 'series' and 'label' (should be 'TrialID' and 'IsAlert'?)
the following code in Dataset/load_segment_data.py try to rearrange data matrix from (sample, window_len, channel) to (sample, channel, window_len) with Numpy reshape method, I think this is wrong because reshape is just to simply re-segment the items, not to transpose the matrix. So that vectors in Data['train_data'] are ill aligned.

Data['train_data'] = X_train.reshape(X_train.shape[0], X_train.shape[2], X_train.shape[1])

So do you have another version of Ford dataset? And if the algorithm got good score with the ill aligned dataset, maybe it could achieve better performance with rectified code?

Navid Mohammadi Foumani · Answer 1 · Wed Jul 05 2023 12:13:03 GMT+0800 (China Standard Time)

I apologize for the problems you encountered with the Ford StayAlert dataset documentation and code. I appreciate you bringing this issue to our attention. I have made the necessary updates to the code, and I kindly request that you re-download it (or please replace the existing util.py file in your project with the updated version)

Ford dataset:
Access the dataset from the following Kaggle competition link: https://www.kaggle.com/competitions/stayalert/data.
Download the "stayalert.zip", which contains the following files:
Solution.csv
fordTrain.csv
fordTest.csv

Labeling the test data:
The file fordTest.csv does not have labels. To assign labels to the test data, follow these steps:
Open the Solution.csv file.
Copy the contents of the prediction columns.
Paste the copied prediction values into the "ISAlert" column of the fordTest.csv file.
Renaming and copying files:

Rename the fordTrain.csv and fordTest.csv files to FordChallenge_Train.csv and FordChallenge_Test.csv, respectively.
Copy the FordChallenge_Train.csv and FordChallenge_Test.csv files to the following directory: Datasets/Segmentation/FordChallenge.
Column renaming:

Open the FordChallenge_Train.csv and FordChallenge_Test.csv files.
Rename the following columns:
"TrialID" to "series"
"obsNum" to "timestamp"
"IsAlert" to "label"

Finally: Copy the FordChallenge_TEST.csv and FordChallenge_Train.csv to: Datasets/Segmentation/FordChallenge