how many SARS-COV-2 sequences can nextclade handle in a MSA file?
liamxg opened this issue · comments
Dear @nextclade team,
I have more than 5 million SARS-COV-2 sequences need to align, can nextclade handle this?
Hi, @liamxg, it depends on what version you want to use.
Nextclade Web (the web version, on https://clades.nextstrain.org) can handle ~1000 sequences at a time, depending on your browser and computer resources (computation is done inside your browser, on your computer). If you need to use Nextclade Web, then we recommend to split your data into smaller batches and/or subsample it.
For large-scale analysis we recommend using Nextclade CLI (command line version; see docs here: https://docs.nextstrain.org/projects/nextclade/en/stable/user/nextclade-cli.html). You can see how we use it internally in:
- https://github.com/nextstrain/ncov-ingest (fetching from GISAID and Genbank databases, alignment and basic analysis)
- https://github.com/nextstrain/ncov (phylogenetic analysis)
Feel free to join our discussion forums, where you can discuss your case with other users and with Nextstrain team: https://discussion.nextstrain.org/
Dear @ivan-aksamentov,
Thanks.
Could you help me out:
nextclade run
--input-dataset data/sars-cov-2
--output-all=output/
data/sars-cov-2/sequences.fasta
Error:
0: --input-dataset: path is invalid. Expected a directory path or a zip archive file path, but got: '"data/sars-cov-2"'
Location:
packages_rs/nextclade-cli/src/cli/nextclade_loop.rs:55
Backtrace omitted. Run with RUST_BACKTRACE=1 environment variable to display it.
Run with RUST_BACKTRACE=full to include source snippets.
using more than 5 million sequences.
What's inside data/sars-cov-2
?
Dear @ivan-aksamentov,
@liamxg If Nextclade cannot find dataset files, it means you probably confused your directories. This is not related to Nextclade, so you will have to figure this out yourself, sorry. I'd suggest to delete everything and start over paying attention to what paths you are giving to Nextclade and what these paths actually contain. Make sure you read nextclade --help
, nextclade dataset get --help
and nextclade run --help
.
Please open a new issue if you have questions or reports related to Nextclade.
Dear @ivan-aksamentov,
Solved. Thanks.
Dear @ivan-aksamentov,
Is is possible to run more than 5 million sequences at once using Nextclade CLI?