nanoporetech / bonito

A PyTorch Basecaller for Oxford Nanopore Reads

Home Page:https://nanoporetech.com/

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Failed to download training dataset

Francesco-Carlucci opened this issue · comments

Hello,
I was trying to download your dataset with bonito download --training, it manages to correctly download the file dna_r9.4.1.hdf5 but it gets stuck in the convert.py file with the following error:

Traceback (most recent call last):
File "/root/bonito/venv3/bin/bonito", line 33, in
sys.exit(load_entry_point('ont-bonito', 'console_scripts', 'bonito')())
File "/root/bonito/bonito/init.py", line 34, in main
args.func(args)
File "/root/bonito/bonito/cli/convert.py", line 110, in main
training, validation = validation_split(reads, args.validation_reads)
File "/root/bonito/bonito/cli/convert.py", line 76, in validation_split
reads = np.random.permutation(sorted(reads.items()))
File "mtrand.pyx", line 4703, in numpy.random.mtrand.RandomState.permutation
ValueError: setting an array element with a sequence. The requested array has an inhomogeneous shape after 2 dimensions. The detected shape was (66149, 2) + inhomogeneous part.

Also, i would like to ask if there are any Quartznet pretrained models of bonito available, i can see only one toml file with the quartznet architecture, configs/dna_r9.4.1@v1.toml but there is no weights tar file in its folder. I would like to try modifying the architecture, so a pretrained model would be really useful to compare the accuracies.

Many thanks

@Francesco-Carlucci here are the pretrained weights for dna_r9.4.1@v1 & dna_r9.4.1@v2.

Please note both this condition and model architecture have been superseded.

Also, for interest see https://genomebiology.biomedcentral.com/articles/10.1186/s13059-023-02903-2

@iiSeymour
Sorry, superseded means that the dataset downloaded by the command bonito download --training is not used anymore?
Thank you very much for the models files, are they trained on the benchmark you've sent me?
If also the quartznet architecture is not used anymore, what's the architecture of bonito now?

Thanks for the quick answer

commented

Hi @Francesco-Carlucci, Could you please show me your numpy version?

I might have some clue about the error you encountered, please check this #355 (comment)

I was using numpy 1.24.3
I've seen your pull request, many thanks for your help.
By the way i've based my project on the benchmark that @iiSeymour had sent me, so i'm not using this repository anymore.
Have a nice day!

commented

Thanks for your feedback!