RuntimeError when using ```--modified device```

Question

RuntimeError when using ```--modified device```

drivenbyentropy opened this issue a year ago · comments

Hi,

When running bonito with a custom trained modified base model and specifying the --modified device option, it fails at runtime with the following error:

> reading pod5
> outputting aligned bam
> loading model dna_r10.4.1_e8.2_260bps_fast@v3.5.2
> loading modified base model
> loaded modified base model to call (alt to T): T=XXXX
> loading reference
> calling:   0%|                                      | 1/5420253 [00:14<22525:53:46, 14.96s/ reads]/opt/bonito/lib/python3.9/site-packages/remora/data_chunks.py:515: UserWarning: FALLBACK path has been taken inside: runCudaFusionGroup. This is an indication that codegen Failed for some reason.
To debug try disable codegen fallback path via setting the env variable `export PYTORCH_NVFUSER_DISABLE=fallback`
 (Triggered internally at ../third_party/nvfuser/csrc/manager.cpp:335.)
  model.forward(
Exception in thread Thread-6:
Traceback (most recent call last):
  File "/usr/lib/python3.9/threading.py", line 954, in _bootstrap_inner
    self.run()
  File "/opt/bonito/lib/python3.9/site-packages/bonito/multiprocessing.py", line 261, in run
    for i, (k, v) in enumerate(self.iterator):
  File "/opt/bonito/lib/python3.9/site-packages/bonito/cli/basecaller.py", line 137, in <genexpr>
    results = ((k, call_mods(mods_model, k, v)) for k, v in results)
  File "/opt/bonito/lib/python3.9/site-packages/bonito/mod_util.py", line 91, in call_mods
    call_read_mods(
  File "/opt/bonito/lib/python3.9/site-packages/remora/inference.py", line 84, in call_read_mods
    nn_out, labels, pos = read.run_model(model)
  File "/opt/bonito/lib/python3.9/site-packages/remora/data_chunks.py", line 515, in run_model
    model.forward(
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: The following operation failed in the TorchScript interpreter.
Traceback of TorchScript (most recent call last):
RuntimeError: shape '[2, 0, 1]' is invalid for input of size 1474560

Ommiting the --modified_device parameter does work, however at a very slow speed (~40 reads/s).

Is there anything I am missing to move the modified base prediction from CPU to the GPU?

Thank you in advance