nextstrain / nextclade

Viral genome alignment, mutation calling, clade assignment, quality checks and phylogenetic placement

Home Page:https://clades.nextstrain.org

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Maximum Sequence Limit?

ryhisner opened this issue · comments

What is the maximum number of sequences Nextclade can run without crashing? Anything larger than about 2200 sequences usually crashes for me. Is there any way to increase the number of sequences it can accommodate, or is there a pretty hard limit? I've found that clicking off of the Nextclade tab makes it less likely to crash and that browsing the genomes while they upload almost guarantees a crash. I'm not sure why either of these things would have this effect though. Thanks.

There is no hard limit, but Nextclade Web is running all computations on your local computer and uses 32-bit WebAssembly underneath, which is limited to ~3.5 GBytes of memory (RAM). 64-bit WebAssembly exists, but is not yet supported in any browsers yet, so it is not currently feasible to use it.

Another soft limit is that Auspice cannot render this many nodes on the tree anyways, and the tree JSON representation becomes too huge to be manageable.

In the end, it depends on what sequences you analyze - larger genomes and lower quality sequences require more memory. As well as how much memory you have available on your computer. Closing unused browser tabs (especially other Nextclade tabs) and other programs might help to free up the memory (up to the maximum of 3.5 GBytes). In Nextclade's settings you can reduce number of parallel threads and turn off extra markers in sequence views to reduce memory consumption. You can use Chrome's task manager (Menu -> More tools -> Task manager) as well as system memory monitoring tools to keep track of memory and CPU usage.

Additionally, you can split your inputs into chunks and analyze them separately.

If you need large-scale analysis, our recommended solution is to run Nextclade CLI, which can analyze virtually unlimited number of sequences (especially if the Auspice tree JSON is not requested in the outputs). It uses less memory and is much faster. It is not as convenient though, does not have any visualization (but you can use Excel on the TSV output file) and requires some command-line skills.

Thanks. I always have a million tabs open, so maybe I need to change that. Despite reading about 200 pages on the topic, I have zero command-line skills, so using the CLI isn't really an option.