nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads

Home Page:https://nanoporetech.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

bonito export --format dorado seems to be broken

jorisbalc opened this issue · comments

This is a finetuned 4.2.0 sup model after I export it with :

bonito export --format dorado bonito_can_byC_e5_dna_r10_400bps_sup@v4.2.0/

bonito exports:

$ ls 
0.conv.bias.tensor
0.conv.weight.tensor
10.rnn.bias_hh_l0.tensor
10.rnn.bias_ih_l0.tensor
10.rnn.weight_hh_l0.tensor
10.rnn.weight_ih_l0.tensor
11.rnn.bias_hh_l0.tensor
11.rnn.bias_ih_l0.tensor
11.rnn.weight_hh_l0.tensor
11.rnn.weight_ih_l0.tensor
12.linear.bias.tensor
12.linear.weight.tensor
13.linear.weight.tensor
2.conv.bias.tensor
2.conv.weight.tensor
4.conv.bias.tensor
4.conv.weight.tensor
7.rnn.bias_hh_l0.tensor
7.rnn.bias_ih_l0.tensor
7.rnn.weight_hh_l0.tensor
7.rnn.weight_ih_l0.tensor
8.rnn.bias_hh_l0.tensor
8.rnn.bias_ih_l0.tensor
8.rnn.weight_hh_l0.tensor
8.rnn.weight_ih_l0.tensor
9.rnn.bias_hh_l0.tensor
9.rnn.bias_ih_l0.tensor
9.rnn.weight_hh_l0.tensor
9.rnn.weight_ih_l0.tensor
config.toml
losses_5.csv
training.csv
weights_5.tar

however, running the the model with dorado gives an error which i'm not too familiar with:

dorado basecaller --referencev313@v313-GP66-Leopard-11UH:~/working-dir-remora$ dorado basecaller --reference /home/v313/ref-seqs/gm119_ref.fasta --modified-bases-models /home/v313/working-dir-remora/v@1.4/ bonito_can_byC_e5_dna_r10_400bps_sup@v4.2.0/ mod_for_basecalling/ > mod_model_calls_@by1.4sup_byCe5sup.bam

[2023-06-29 18:47:39.282] [info] > Creating basecall pipeline
[2023-06-29 18:47:40.539] [error] open file failed because of errno 2 on fopen: , file path: bonito_can_byC_e5_dna_r10_400bps_sup@v4.2.0/1.conv.weight.tensor
Exception raised from RAIIFile at ../caffe2/serialize/file_adapter.cc:21 (most recent call first):
frame #0: c10::Error::Error(c10::SourceLocation, std::string) + 0x57 (0x7f7bd59cf4d7 in /home/v313/Dorado/bin/../lib/libc10.so)
frame #1: c10::detail::torchCheckFail(char const*, char const*, unsigned int, std::string const&) + 0x64 (0x7f7bd599936b in /home/v313/Dorado/bin/../lib/libc10.so)
frame #2: caffe2::serialize::FileAdapter::RAIIFile::RAIIFile(std::string const&) + 0x124 (0x7f7c27e36a24 in /home/v313/Dorado/bin/../lib/libtorch_cpu.so)
frame #3: caffe2::serialize::FileAdapter::FileAdapter(std::string const&) + 0x2e (0x7f7c27e36a7e in /home/v313/Dorado/bin/../lib/libtorch_cpu.so)
frame #4: caffe2::serialize::PyTorchStreamReader::PyTorchStreamReader(std::string const&) + 0x5a (0x7f7c27e34eda in /home/v313/Dorado/bin/../lib/libtorch_cpu.so)
frame #5: torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::string const&, c10::optional<c10::Device>, std::unordered_map<std::string, std::string, std::hash<std::string>, std::equal_to<std::string>, std::allocator<std::pair<std::string const, std::string> > >&, bool, bool) + 0x2c0 (0x7f7c28f6f850 in /home/v313/Dorado/bin/../lib/libtorch_cpu.so)
frame #6: torch::jit::import_ir_module(std::shared_ptr<torch::jit::CompilationUnit>, std::string const&, c10::optional<c10::Device>, bool) + 0x7f (0x7f7c28f6fbcf in /home/v313/Dorado/bin/../lib/libtorch_cpu.so)
frame #7: torch::jit::load(std::string const&, c10::optional<c10::Device>, bool) + 0xac (0x7f7c28f6fcac in /home/v313/Dorado/bin/../lib/libtorch_cpu.so)
frame #8: torch::serialize::InputArchive::load_from(std::string const&, c10::optional<c10::Device>) + 0x26 (0x7f7c2962fec6 in /home/v313/Dorado/bin/../lib/libtorch_cpu.so)
frame #9: dorado() [0x577e4a]
frame #10: dorado() [0x574f4c]
frame #11: dorado() [0x4c1fd2]
frame #12: dorado() [0x4c3433]
frame #13: dorado() [0x58a763]
frame #14: dorado() [0x5893b6]
frame #15: dorado() [0x4aa0ad]
frame #16: dorado() [0x4aecec]
frame #17: dorado() [0x468d93]
frame #18: __libc_start_main + 0xf3 (0x7f7bceb96083 in /lib/x86_64-linux-gnu/libc.so.6)
frame #19: dorado() [0x46e76f]

Is there any workaround for this?

Hey @jorisbalc

Yes, sorry, the export/input is quite brittle atm. One of the changes/improvements in v4.0+ models was the addition of a new layer after each convolution. If you run the following in your model directory it should load and run successfully in dorado.

mv 2.conv.bias.tensor 1.conv.bias.tensor
mv 2.conv.weight.tensor 1.conv.weight.tensor
mv 4.conv.bias.tensor 2.conv.bias.tensor
mv 4.conv.weight.tensor 2.conv.weight.tensor
mv 7.rnn.bias_hh_l0.tensor 4.rnn.bias_hh_l0.tensor
mv 7.rnn.bias_ih_l0.tensor 4.rnn.bias_ih_l0.tensor
mv 7.rnn.weight_hh_l0.tensor 4.rnn.weight_hh_l0.tensor
mv 7.rnn.weight_ih_l0.tensor 4.rnn.weight_ih_l0.tensor
mv 8.rnn.bias_hh_l0.tensor 5.rnn.bias_hh_l0.tensor
mv 8.rnn.bias_ih_l0.tensor 5.rnn.bias_ih_l0.tensor
mv 8.rnn.weight_hh_l0.tensor 5.rnn.weight_hh_l0.tensor
mv 8.rnn.weight_ih_l0.tensor 5.rnn.weight_ih_l0.tensor
mv 9.rnn.bias_hh_l0.tensor 6.rnn.bias_hh_l0.tensor
mv 9.rnn.bias_ih_l0.tensor 6.rnn.bias_ih_l0.tensor
mv 9.rnn.weight_hh_l0.tensor 6.rnn.weight_hh_l0.tensor
mv 9.rnn.weight_ih_l0.tensor 6.rnn.weight_ih_l0.tensor
mv 10.rnn.bias_hh_l0.tensor 7.rnn.bias_hh_l0.tensor
mv 10.rnn.bias_ih_l0.tensor 7.rnn.bias_ih_l0.tensor
mv 10.rnn.weight_hh_l0.tensor 7.rnn.weight_hh_l0.tensor
mv 10.rnn.weight_ih_l0.tensor 7.rnn.weight_ih_l0.tensor
mv 11.rnn.bias_hh_l0.tensor 8.rnn.bias_hh_l0.tensor
mv 11.rnn.bias_ih_l0.tensor 8.rnn.bias_ih_l0.tensor
mv 11.rnn.weight_hh_l0.tensor 8.rnn.weight_hh_l0.tensor
mv 11.rnn.weight_ih_l0.tensor 8.rnn.weight_ih_l0.tensor
mv 12.linear.bias.tensor 9.linear.bias.tensor
mv 12.linear.weight.tensor 9.linear.weight.tensor
mv 13.linear.weight.tensor 10.linear.weight.tensor

This seems to correct the error. However, the exported model seems to be of 4kHz sample rate, even though I used the 5kHz v4.2.0_sup model for finetuning. Is there a way to make the model work on 5kHz data? I'd assume the model should still work correctly on 5kHz data and export function in bonito messes it up somehow?

EDIT: I have used the finetuned model with bonito and it works perfectly fine with 5kHz data, it's just the export function somehow changes it to 4kHz.

@jorisbalc just add a [run_info] section to your model config with the sample_rate.

$ grep -A 2 run_info dna_r10.4.1_e8.2_400bps_hac@v4.2.0/config.toml 
[run_info]
sample_rate = 5000

This seems to be working now, thanks!