Fails with CUBLAS_STATUS_INTERNAL_ERROR on linux, benepar 0.2.0, spacy 3.0

Question

Fails with CUBLAS_STATUS_INTERNAL_ERROR on linux, benepar 0.2.0, spacy 3.0

yanvirin opened this issue 3 years ago · comments

I am trying to run the code from the README which creates a simple doc instance with nlp object from spacy and I get the following error:
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasCreate(handle)

Interestingly enough, the older version benepar 0.1.3, spacy 2.3 works fine with the GPU (with tensorflow).
Other Torch based models run fine on my computer.
Running with CUDA_VISIBLE_DEVICES="" makes it work, with no exception, so the problem is related to the GPU.
I am wondering if this can be something like this: https://discuss.pytorch.org/t/cuda-error-cublas-status-internal-error-probably-related-to-memory/96167/2 (where only a portion of the model is committed to cuda?)

Any idea why this might be happening with this library in particular?

+-----------------------------------------------------------------------------+
| Processes: |
| GPU GI CI PID Type Process name GPU Memory |
| ID ID Usage |
|=============================================================================|
| 0 N/A N/A 2786 G /usr/lib/xorg/Xorg 4MiB |
| 0 N/A N/A 11405 C python 1301MiB |
+-----------------------------------------------------------------------------+

Full stacktrace:
Traceback (most recent call last):
File "", line 1, in
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/spacy/language.py", line 995, in call
error_handler(name, proc, [doc], e)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/spacy/util.py", line 1498, in raise_error
raise e
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/spacy/language.py", line 990, in call
doc = proc(doc, **component_cfg.get(name, {}))
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/benepar/integrations/spacy_plugin.py", line 151, in call
self._parser.parse(
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/benepar/parse_chart.py", line 416, in parse
res = subbatching.map(
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/benepar/subbatching.py", line 60, in map
for item_id, item_out in zip(item_ids, subbatch_out):
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/benepar/parse_chart.py", line 366, in _parse_encoded
span_scores, tag_scores = self.forward(batch)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/benepar/parse_chart.py", line 284, in forward
pretrained_out = self.pretrained_model(
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 1304, in forward
encoder_outputs = self.encoder(
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 951, in forward
layer_outputs = layer_module(
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 633, in forward
self_attention_outputs = self.layer[0](
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 540, in forward
attention_output = self.SelfAttention(
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/transformers/models/t5/modeling_t5.py", line 468, in forward
query_states = shape(self.q(hidden_states)) # (batch_size, n_heads, seq_length, dim_per_head)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/torch/nn/modules/module.py", line 889, in _call_impl
result = self.forward(*input, **kwargs)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/torch/nn/modules/linear.py", line 94, in forward
return F.linear(input, self.weight, self.bias)
File "/home/yanvirin/code/benepar/env/lib/python3.8/site-packages/torch/nn/functional.py", line 1753, in linear
return torch._C._nn.linear(input, weight, bias)
RuntimeError: CUDA error: CUBLAS_STATUS_INTERNAL_ERROR when calling cublasCreate(handle)

Yan Virin · Answer 1 · Tue Mar 30 2021 03:47:25 GMT+0800 (China Standard Time)

It seems like the issue was a not enough torch for my CUDA drivers. When I installed torch 1.8.1 the problem disappeared.