RuntimeError: "index_select_out_cuda_impl" not implemented for 'Float'
AnyaMit opened this issue · comments
Describe the bug
Getting a RuntimeError: "index_select_out_cuda_impl" not implemented for 'Float'
To Reproduce
##dataset
Download the zip file
path_to_zip = tf.keras.utils.get_file("smsspamcollection.zip",origin="https://archive.ics.uci.edu/ml/machine-learning-databases/00228/smsspamcollection.zip",extract=True)
Unzip the file int a folder
!unzip $path_to_zip -d data
spam_dataset = []
for line in lines:
label, text = line.split('\t')
if label.strip() == 'spam':
spam_dataset.append((1, text.strip()))
else:
spam_dataset.append((0, text.strip()))
print(spam_dataset)
#process the df
import pandas as pd
df = pd.DataFrame(spam_dataset, columns=['Spam','Message'])
import re
def message_length(x):
return len(x)
def num_capitals(x):
_, count = re.subn(r'[A-Zz]', '', x) # only works in english
return count
def num_punctuation(x):
_, count = re.subn(r'\W', '', x)
return count
df['Capitals'] = df['Message'].apply(num_capitals)
df['Punctuation'] = df['Message'].apply(num_punctuation)
df['Length'] = df['Message'].apply(message_length)
df.describe()
Print out of the df
Spam Capitals Punctuation Length
count 5574.000000 5574.000000 5574.000000 5574.000000
mean 0.134015 5.706315 18.942591 80.443488
std 0.340699 11.720229 14.825994 59.841746
min 0.000000 0.000000 0.000000 2.000000
25% 0.000000 1.000000 8.000000 36.000000
50% 0.000000 2.000000 15.000000 61.000000
75% 0.000000 4.000000 27.000000 122.000000
max 1.000000 129.000000 253.000000 910.000000
Now we want to add a new column with tokenized words - we use snlp for this
!pip install stanfordnlp as snlp
import stanfordnlp as snlp
en = snlp.download('en')
en = snlp.Pipeline(lang='en', processors='tokenize')
tokenized = en(sentence)
len(tokenized.sentences)
for snt in tokenized.sentences:
for word in snt.tokens:
print(word.text)
print("")
en = snlp.Pipeline(lang='en')
print(en)
##Function which does not work with float
def word_counts(x, pipeline=en):
doc = pipeline(x)
count = sum([len(sentence.tokens) for sentence in doc.sentences])
return count
train['Words'] = train['Message'].apply(word_counts)
test['Words'] = test['Message'].apply(word_counts)
ISSUE - Error print out
/usr/local/lib/python3.7/dist-packages/stanfordnlp/models/depparse/model.py:157: UserWarning: masked_fill_ received a mask with dtype torch.uint8, this behavior is now deprecated,please use a mask with dtype torch.bool instead. (Triggered internally at /pytorch/aten/src/ATen/native/cuda/LegacyDefinitions.cpp:28.)
unlabeled_scores.masked_fill_(diag, -float('inf'))
RuntimeError Traceback (most recent call last)
in ()
4 # unlabeled_scores.masked_fill_(diag, -float('inf'))
5
----> 6 train['Words'] = train['Message'].apply(word_counts)
7 test['Words'] = test['Message'].apply(word_counts)
8
7 frames
pandas/_libs/lib.pyx in pandas._libs.lib.map_infer()
/usr/local/lib/python3.7/dist-packages/stanfordnlp/models/common/seq2seq_model.py in update_state(states, idx, positions, beam_size)
191 br, d = e.size()
192 s = e.contiguous().view(beam_size, br // beam_size, d)[:,idx]
--> 193 s.data.copy_(s.data.index_select(0, positions))
194
195 # (3) main loop
RuntimeError: "index_select_out_cuda_impl" not implemented for 'Float'
Expected behavior
The line - train['Words'] = train['Message'].apply(word_counts) should add a column named 'Words' which applies the word_counts function to the sentences.
Spam Capitals Punctuation Length Words
Environment (please complete the following information):
- OS: [Windows]
- Python version: [Python 3.6.9 - using Google Colab]
- StanfordNLP version: [0.2.0]
Additional context
Using the examples from the book Advanced Natural Language Processing with TensorFlow 2 by Ashish Bansal