hfawaz / dl-4-tsc

Deep Learning for Time Series Classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

data format of UCR2018

guoxishu opened this issue · comments

In utils.py, there is "pd.read_csv(..._TRAIN.tsv)",but there is now only data of ts format provided on the official website. Then there is obvious error for "y_train = df_train.values[:,0]" if data of ts format is used. Can you add some comment about the data shape? I'm really quite confused.

I am not quite sure which format is now available, I will get back to you once I re-check the UCR archive.

Hi Sir, I have the same question here. Do you have any updates about the data format? thanks!

more specifically, I see the errors are:

python3 main.py TSC Coffee fcn _itr_8
Method: TSC Coffee fcn _itr_8
Traceback (most recent call last):
File "main.py", line 150, in
datasets_dict = read_dataset(root_dir, archive_name, dataset_name)
File "/Users/taosun/Documents/GitHub/dl-4-tsc/utils/utils.py", line 105, in read_dataset
x_train, y_train = readucr(file_name + '_TRAIN')
File "/Users/taosun/Documents/GitHub/dl-4-tsc/utils/utils.py", line 33, in readucr
data = np.loadtxt(filename, delimiter=',')
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1146, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1074, in read_data
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1074, in
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 781, in floatconv
return float(x)
ValueError: could not convert string to float: '@problemName Coffee'

i had the same problem and solved it by using Coffee_TRAIN.txt and Coffee_TEST.txt instead of the once with .ts format. Then if you go to line 33 of utils.py u can see [data = np.loadtxt(filename, delimiter=' ,')]. Here you just need to swap the "," with a double spacebar, since the txt file has a different separator for parseing. Now i have different error but at least you can solve that one XD

I think the problem is that the UCR dataset was updated in 2018, which changed the formatting (and added new datasets). I found the old dataset through this link, which appears to work. Hope this helps anyone running into this!

I managed to run the baselines on UCR dataset in the data formt arff with the following modifications:

  • Install liac-arff: pip install liac-arff ;
  • Add import arff in the header of utils.py ;
  • Add the implementation of this function on reading arff data in utils.py:
def load_data(datapath):
    """ Load .arff dataset on univariate time series classification """
    trainfile = datapath.split('/')[-2] + '_TRAIN.arff'
    testfile = datapath.split('/')[-2] + '_TEST.arff'

    train = arff.load(open(os.path.join(datapath, trainfile), 'r'))['data']
    test = arff.load(open(os.path.join(datapath, testfile), 'r'))['data']

    # Post-processing
    x_train, y_train = [], []
    for row in train:
        x_train.append(row[:-1])
        y_train.append(row[-1])
    x_train = np.vstack(x_train)
    enc = LabelEncoder()
    y_train = enc.fit_transform(y_train)

    x_test, y_test = [], []
    for row in test:
        x_test.append(row[:-1])
        y_test.append(row[-1])
    x_test = np.vstack(x_test)
    y_test = enc.transform(y_test)
    
    return x_train, y_train, x_test, y_test
  • Replace the code snippt from Line 76-90 of utils.py with x_train, y_train, x_test, y_test = load_data(root_dir_dataset)

Hope this could be helpful for someone else ;-)

hello, I have found the data in .tsv format. This is the website https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Hope this can help you~
Besides, I also have tried to change the code to run on the data in .arff format by using "arff.loadarff" and I succeeded. But running on .txt format failed.
My device type is RTX 3060, Cuda 11.5 CuDNN 8.4, in which case I met the problem
"Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed ab
ove.
[[node model/conv1d/conv1d (defined at C:\Users\11642\Desktop\科研\第四周\dl-4-tsc-master_origin\classifiers\fcn.py:73) ]] [Op:__inference_train_function_1608]"
Don't worry! This is not the problem of version mismatch, but Insufficient graphics memory. What you should do is add the code below at the head of main.py.
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
for dev in physical_devices: # 如果使用多块GPU时
tf.config.experimental.set_memory_growth(dev, True)
This can limit the usage of you GPU.