inspirehep / magpie

Deep neural network framework for multi-label text classification

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Getting Issues in Batch Training ~ IndexError

rajeshkumargp opened this issue · comments

I have tried to batch train a data with a corpus.
While batch training I got error as below:

Fitted to 8691 vectors
Epoch 1/10
Traceback (most recent call last):
File "TrainWithCorpus.py", line 98, in <module>
magpie.batch_train(train_dir, vocabulary=labels,epochs=EPOCH_CUR,callbacks=[csv_logger,checkpoint,b])
File "/home/Abc/TrialBatch/Version3.0/Scripts/magpieCNN/magpie/main.py", line 176, in batch_train
verbose=verbose,
File "/home/Abc/TrialBatch/TrailVirtualEnvironment/lib/python3.5/site-packages/keras/legacy/interfaces.py", line 91, in wrapper
return func(*args, **kwargs)
File "/home/Abc/TrialBatch/TrailVirtualEnvironment/lib/python3.5/site-packages/keras/engine/training.py", line 1418, in fit_generator
initial_epoch=initial_epoch)
File "/home/Abc/TrialBatch/TrailVirtualEnvironment/lib/python3.5/site-packages/keras/engine/training_generator.py", line 181, in fit_generator
generator_output = next(output_generator)
File "/home/Abc/TrialBatch/TrailVirtualEnvironment/lib/python3.5/site-packages/keras/utils/data_utils.py", line 709, in get
six.reraise(*sys.exc_info())
File "/home/Abc/TrialBatch/TrailVirtualEnvironment/lib/python3.5/site-packages/six.py", line 693, in reraise
raise value
File "/home/Abc/TrialBatch/TrailVirtualEnvironment/lib/python3.5/site-packages/keras/utils/data_utils.py", line 685, in get
inputs = self.queue.get(block=True).get()
File "/usr/lib/python3.5/multiprocessing/pool.py", line 608, in get
raise self._value
File "/usr/lib/python3.5/multiprocessing/pool.py", line 119, in worker
result = (True, func(*args, **kwds))
File "/home/Abc/TrialBatch/TrailVirtualEnvironment/lib/python3.5/site-packages/keras/utils/data_utils.py", line 626, in next_sample
return six.next(_SHARED_SEQUENCES[uid])
File "/home/Abc/TrialBatch/Version3.0/Scripts/magpieCNN/magpie/nn/input_data.py", line 105, in iterate_over_batches
yield build_x_and_y(files, filename_it.dirname, **kwargs)
File "/home/Abc/TrialBatch/Version3.0/Scripts/magpieCNN/magpie/nn/input_data.py", line 88, in build_x_and_y
y_matrix[doc_id][index] = True
IndexError: index 1135 is out of bounds for axis 0 with size 1102

Can you please guide to proceed further.

I have debugged somewhat get this info .

My labels has 1938 as length.

The length of label_indices gets reduced to 1102 a this line.

File: magpie/nn/input_data.py
y_matrix = np.zeros((len(filenames), len(label_indices)), dtype=np.bool_)

Later the below lines are causing error:

for lab in labels:
index = label_indices[lab]
y_matrix[doc_id][index] = True

Need to analyse further.
Provide any suggestions/guide to fix this.

Lastly I found out, the issue was due to Duplicate Data in Labels.
Because of that, the above issue arose .