nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement

Home Page:https://clades.nextstrain.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

how many SARS-COV-2 sequences can nextclade handle in a MSA file?

liamxg opened this issue · comments

commented

Dear @nextclade team,

I have more than 5 million SARS-COV-2 sequences need to align, can nextclade handle this?

Hi, @liamxg, it depends on what version you want to use.

Nextclade Web (the web version, on https://clades.nextstrain.org) can handle ~1000 sequences at a time, depending on your browser and computer resources (computation is done inside your browser, on your computer). If you need to use Nextclade Web, then we recommend to split your data into smaller batches and/or subsample it.

For large-scale analysis we recommend using Nextclade CLI (command line version; see docs here: https://docs.nextstrain.org/projects/nextclade/en/stable/user/nextclade-cli.html). You can see how we use it internally in:

Feel free to join our discussion forums, where you can discuss your case with other users and with Nextstrain team: https://discussion.nextstrain.org/

commented

Dear @ivan-aksamentov,

Thanks.

Could you help me out:

nextclade run
--input-dataset data/sars-cov-2
--output-all=output/
data/sars-cov-2/sequences.fasta
Error:
0: --input-dataset: path is invalid. Expected a directory path or a zip archive file path, but got: '"data/sars-cov-2"'

Location:
packages_rs/nextclade-cli/src/cli/nextclade_loop.rs:55

Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.

commented

using more than 5 million sequences.

What's inside data/sars-cov-2?

commented

Dear @ivan-aksamentov,

please see bellow:
image

@liamxg If Nextclade cannot find dataset files, it means you probably confused your directories. This is not related to Nextclade, so you will have to figure this out yourself, sorry. I'd suggest to delete everything and start over paying attention to what paths you are giving to Nextclade and what these paths actually contain. Make sure you read nextclade --help, nextclade dataset get --help and nextclade run --help.

Please open a new issue if you have questions or reports related to Nextclade.

commented

Dear @ivan-aksamentov,
Solved. Thanks.

commented

Dear @ivan-aksamentov,

Is is possible to run more than 5 million sequences at once using Nextclade CLI?