data format of UCR2018

Question

data format of UCR2018

guoxishu opened this issue 4 years ago · comments

In utils.py, there is "pd.read_csv(..._TRAIN.tsv)",but there is now only data of ts format provided on the official website. Then there is obvious error for "y_train = df_train.values[:,0]" if data of ts format is used. Can you add some comment about the data shape? I'm really quite confused.

Hassan ISMAIL FAWAZ · Answer 1 · Mon May 11 2020 19:07:20 GMT+0800 (China Standard Time)

I am not quite sure which format is now available, I will get back to you once I re-check the UCR archive.

oceanfly · Answer 2 · Thu Jun 04 2020 04:30:28 GMT+0800 (China Standard Time)

Hi Sir, I have the same question here. Do you have any updates about the data format? thanks!

oceanfly · Answer 3 · Thu Jun 04 2020 04:38:55 GMT+0800 (China Standard Time)

more specifically, I see the errors are:

python3 main.py TSC Coffee fcn _itr_8
Method: TSC Coffee fcn _itr_8
Traceback (most recent call last):
File "main.py", line 150, in
datasets_dict = read_dataset(root_dir, archive_name, dataset_name)
File "/Users/taosun/Documents/GitHub/dl-4-tsc/utils/utils.py", line 105, in read_dataset
x_train, y_train = readucr(file_name + '_TRAIN')
File "/Users/taosun/Documents/GitHub/dl-4-tsc/utils/utils.py", line 33, in readucr
data = np.loadtxt(filename, delimiter=',')
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1146, in loadtxt
for x in read_data(_loadtxt_chunksize):
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1074, in read_data
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 1074, in
items = [conv(val) for (conv, val) in zip(converters, vals)]
File "/Users/taosun/anaconda3/lib/python3.6/site-packages/numpy/lib/npyio.py", line 781, in floatconv
return float(x)
ValueError: could not convert string to float: '@problemName Coffee'

Melogica · Answer 4 · Tue Jun 23 2020 21:56:40 GMT+0800 (China Standard Time)

i had the same problem and solved it by using Coffee_TRAIN.txt and Coffee_TEST.txt instead of the once with .ts format. Then if you go to line 33 of utils.py u can see [data = np.loadtxt(filename, delimiter=' ,')]. Here you just need to swap the "," with a double spacebar, since the txt file has a different separator for parseing. Now i have different error but at least you can solve that one XD

Andrew · Answer 5 · Wed Oct 28 2020 14:51:46 GMT+0800 (China Standard Time)

I think the problem is that the UCR dataset was updated in 2018, which changed the formatting (and added new datasets). I found the old dataset through this link, which appears to work. Hope this helps anyone running into this!

guoxishu · Answer 6 · Wed Oct 28 2020 14:55:58 GMT+0800 (China Standard Time)

I’m glad to here from you. Thanks a lot! Best wishes, Dan On 10/28/2020 14:52, Andrew wrote: I think the problem is that the UCR dataset was updated in 2018, which changed the formatting. I tried out the old dataset, which appears to work. Here is the link to download the old dataset (the password is "attempttoclassify"). Hope this helps anyone running into this issue! — You are receiving this because you authored the thread. Reply to this email directly, view it on GitHub, or unsubscribe.

Yi-Xuan XU · Answer 7 · Mon Dec 28 2020 17:12:18 GMT+0800 (China Standard Time)

I managed to run the baselines on UCR dataset in the data formt arff with the following modifications:

Install liac-arff: pip install liac-arff ;
Add import arff in the header of utils.py ;
Add the implementation of this function on reading arff data in utils.py:

def load_data(datapath):
    """ Load .arff dataset on univariate time series classification """
    trainfile = datapath.split('/')[-2] + '_TRAIN.arff'
    testfile = datapath.split('/')[-2] + '_TEST.arff'

    train = arff.load(open(os.path.join(datapath, trainfile), 'r'))['data']
    test = arff.load(open(os.path.join(datapath, testfile), 'r'))['data']

    # Post-processing
    x_train, y_train = [], []
    for row in train:
        x_train.append(row[:-1])
        y_train.append(row[-1])
    x_train = np.vstack(x_train)
    enc = LabelEncoder()
    y_train = enc.fit_transform(y_train)

    x_test, y_test = [], []
    for row in test:
        x_test.append(row[:-1])
        y_test.append(row[-1])
    x_test = np.vstack(x_test)
    y_test = enc.transform(y_test)
    
    return x_train, y_train, x_test, y_test

Replace the code snippt from Line 76-90 of utils.py with x_train, y_train, x_test, y_test = load_data(root_dir_dataset)

Hope this could be helpful for someone else ;-)

Hengyi Yang · Answer 8 · Mon May 09 2022 12:26:17 GMT+0800 (China Standard Time)

hello, I have found the data in .tsv format. This is the website https://www.cs.ucr.edu/~eamonn/time_series_data_2018/
Hope this can help you~
Besides, I also have tried to change the code to run on the data in .arff format by using "arff.loadarff" and I succeeded. But running on .txt format failed.
My device type is RTX 3060, Cuda 11.5 CuDNN 8.4, in which case I met the problem
"Failed to get convolution algorithm. This is probably because cuDNN failed to initialize, so try looking to see if a warning log message was printed ab
ove.
[[node model/conv1d/conv1d (defined at C:\Users\11642\Desktop\科研\第四周\dl-4-tsc-master_origin\classifiers\fcn.py:73) ]] [Op:__inference_train_function_1608]"
Don't worry! This is not the problem of version mismatch, but Insufficient graphics memory. What you should do is add the code below at the head of main.py.
import tensorflow as tf
physical_devices = tf.config.list_physical_devices('GPU')
for dev in physical_devices: # 如果使用多块GPU时
tf.config.experimental.set_memory_growth(dev, True)
This can limit the usage of you GPU.