JayYip / m3tl

BERT for Multitask Learning

Home Page:https://jayyip.github.io/m3tl/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Shape Mismatch error for new data set

rudra0713 opened this issue · comments

Hey, I have been trying to use a sentiment analysis dataset with the imdb class (mentioned in the notebook) as a multitask.

This is the sample format of the sentiment data:
train_data = [['I', 'am', 'going', 'to', 'school', '.'], ['I', 'am', 'not', 'feeling', 'good', '.']] train_labels = [0, 1] test_data = [['I', 'wass', 'so', 'sick', 'yesterday', '.']] test_labels = [1]
Unfortunately, this runs to the error

ValueError: generator yielded an element of shape (48,) where an element of shape () was expected.

Can you kindly help me solve this issue?

Seems it's mixing the data and labels. Did you use the exactly same pre-process function in the notebook?

Thanks for your response. This is my preprocessing function:
`@preprocessing_fn
def sentiment_cls(params, mode):
# train_data = pickle.load(open("data/sentiment_train_data.p", "rb"))
# train_labels = pickle.load(open("data/sentiment_train_label.p", "rb"))
# test_data = pickle.load(open("data/sentiment_test_data.p", "rb"))
# test_labels = pickle.load(open("data/sentiment_test_label.p", "rb"))

train_data = [['I', 'am', 'going', 'to', 'school', '.'], ['I', 'am', 'going', 'to', 'college', '.']]
train_labels = [0, 1]
test_data = [['I', 'am', 'going', 'to', 'university', '.']]
test_labels = [0]

label_encoder = get_or_make_label_encoder(params, 'sentiment_cls', mode, train_labels + test_labels)

if mode == TRAIN:
    input_list = train_data
    target_list = train_labels
else:
    input_list = test_data
    target_list = test_labels
return input_list, target_list

`
The first four lines load the actual dataset. Since that was not working, I tried with toy exaxples, which is also not working.

This is the new problem dictionary:
new_problem_type = {'imdb_cls': 'cls', 'sentiment_cls': 'cls'} new_problem_process_fn_dict = {'imdb_cls': imdb_cls, 'sentiment_cls': sentiment_cls}

Please let me know if I am missing something very simple.

Could you please try changing the input data to ['I am going to school .', 'I am going to college .']?

I tried that, but the error does not change.

Thanks. Please, let me know if you find anything.