nextstrain / ncov

Nextstrain build for novel coronavirus SARS-CoV-2

Home Page:https://nextstrain.org/ncov

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

/bin/bash: nextclade2: command not found

llk578496 opened this issue · comments

Current Behavior
Hello! We would like to use Nextstrain to perform the phylogenetic analysis for the SARS-CoV-2 genomes. We have been following the tutorial on SARS-CoV-2 Workflow. We have finished the Setup and installation section. However, when we tried to follow the instructions from Run using example data to have a test run using the provided example data, we got the error as below:

(base) gilman_siu2@gilmansiu2-Z490-VISION-D:/mnt/data6/COVID/phylogenetic_trees/20230213_latest_local_paper_2022_whole_year/nextstrain-cli-v8.2.0/example-data/ncov$ nextstrain build . --configfile ncov-tutorial/example-data.yaml
Building DAG of jobs...
Using shell: /bin/bash
Provided cores: 20
Rules claiming more threads will be scaled down.
Job counts:
count jobs
1 adjust_metadata_regions
1 align
1 all
1 ancestral
1 annotate_metadata_with_index
1 build_align
1 build_description
1 calculate_epiweeks
1 clade_files
1 clades
1 colors
1 combine_samples
1 diagnostic
1 distances
1 emerging_lineages
1 export
1 filter
1 finalize
1 include_hcov19_prefix
1 index
1 join_metadata_and_nextclade_qc
1 logistic_growth
1 mask
1 mutational_fitness
1 prepare_nextclade
1 recency
1 refine
1 sanitize_metadata
1 subsample
1 tip_frequencies
1 traits
1 translate
1 tree
33

[Thu May 30 04:46:15 2024]
rule clade_files:
input: defaults/clades.tsv
output: results/default-build/clades.tsv
jobid: 22
benchmark: benchmarks/clade_files_default-build.txt
wildcards: build_name=default-build

    python3 scripts/rename_clades.py --input-clade-files defaults/clades.tsv             --name-mapping defaults/clade_display_names.yml             --output-clades results/default-build/clades.tsv

[Thu May 30 04:46:15 2024]
Job 27:
Downloading reference files for nextclade (used for alignment and qc).

    nextclade2 --version
    nextclade2 dataset get --name sars-cov-2 --output-zip data/sars-cov-2-nextclade-defaults.zip

[Thu May 30 04:46:15 2024]
rule sanitize_metadata:
input: data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz
output: results/sanitized_metadata_reference_data.tsv.xz
log: logs/sanitize_metadata_reference_data.txt
jobid: 31
benchmark: benchmarks/sanitize_metadata_reference_data.txt
wildcards: origin=reference_data
resources: mem_mb=2000

/bin/bash: nextclade2: command not found

    python3 scripts/sanitize_metadata.py             --metadata data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz             --metadata-id-columns strain name 'Virus name'             --database-id-columns 'Accession ID' gisaid_epi_isl genbank_accession             --parse-location-field Location             --rename-fields 'Virus name=strain' Type=type 'Accession ID=gisaid_epi_isl' 'Collection date=date' 'Additional location information=additional_location_information' 'Sequence length=length' Host=host 'Patient age=patient_age' Gender=sex Clade=GISAID_clade 'Pango lineage=pango_lineage' pangolin_lineage=pango_lineage Lineage=pango_lineage 'Pangolin version=pangolin_version' Variant=variant 'AA Substitutions=aaSubstitutions' 'Submission date=date_submitted' 'Is reference?=is_reference' 'Is complete?=is_complete' 'Is high coverage?=is_high_coverage' 'Is low coverage?=is_low_coverage' N-Content=n_content GC-Content=gc_content             --strip-prefixes hCoV-19/ SARS-CoV-2/                          --output results/sanitized_metadata_reference_data.tsv.xz 2>&1 | tee logs/sanitize_metadata_reference_data.txt

[Thu May 30 04:46:15 2024]
Error in rule prepare_nextclade:
jobid: 27
output: data/sars-cov-2-nextclade-defaults.zip
shell:

    nextclade2 --version
    nextclade2 dataset get --name sars-cov-2 --output-zip data/sars-cov-2-nextclade-defaults.zip
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

[Thu May 30 04:46:15 2024]
Finished job 22.
1 of 33 steps (3%) done
Downloading from remote: data.nextstrain.org/files/ncov/open/reference/metadata.tsv.xz
Finished download.

[Thu May 30 04:46:16 2024]
Job 30:
Aligning sequences to defaults/reference_seq.fasta
- gaps relative to reference are considered real

    python3 scripts/sanitize_sequences.py             --sequences data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz             --strip-prefixes hCoV-19/ SARS-CoV-2/             --output /dev/stdout 2> logs/sanitize_sequences_reference_data.txt             | nextalign run             --jobs=8             --reference defaults/reference_seq.fasta             --genemap defaults/annotation.gff             --output-translations results/translations/seqs_reference_data.gene.{gene}.fasta             --output-fasta results/aligned_reference_data.fasta             --output-insertions results/insertions_reference_data.tsv > logs/align_reference_data.txt 2>&1;
    xz -2 -T 8 results/aligned_reference_data.fasta;
    xz -2 -T 8 results/translations/seqs_reference_data.gene.*.fasta

Downloading from remote: data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz
[Thu May 30 04:46:17 2024]
Finished job 31.
2 of 33 steps (6%) done
Finished download.

[Thu May 30 04:46:17 2024]
Job 18: Templating build description for Auspice

Job counts:
count jobs
1 build_description
1
[Thu May 30 04:46:18 2024]
Finished job 18.
3 of 33 steps (9%) done
[Thu May 30 04:46:18 2024]
Error in rule align:
jobid: 30
output: results/aligned_reference_data.fasta.xz, results/insertions_reference_data.tsv, results/translations/seqs_reference_data.gene.ORF1a.fasta.xz, results/translations/seqs_reference_data.gene.ORF1b.fasta.xz, results/translations/seqs_reference_data.gene.S.fasta.xz, results/translations/seqs_reference_data.gene.ORF3a.fasta.xz, results/translations/seqs_reference_data.gene.E.fasta.xz, results/translations/seqs_reference_data.gene.M.fasta.xz, results/translations/seqs_reference_data.gene.ORF6.fasta.xz, results/translations/seqs_reference_data.gene.ORF7a.fasta.xz, results/translations/seqs_reference_data.gene.ORF7b.fasta.xz, results/translations/seqs_reference_data.gene.ORF8.fasta.xz, results/translations/seqs_reference_data.gene.N.fasta.xz, results/translations/seqs_reference_data.gene.ORF9b.fasta.xz
log: logs/align_reference_data.txt (check log file(s) for error message)
shell:

    python3 scripts/sanitize_sequences.py             --sequences data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz             --strip-prefixes hCoV-19/ SARS-CoV-2/             --output /dev/stdout 2> logs/sanitize_sequences_reference_data.txt             | nextalign run             --jobs=8             --reference defaults/reference_seq.fasta             --genemap defaults/annotation.gff             --output-translations results/translations/seqs_reference_data.gene.{gene}.fasta             --output-fasta results/aligned_reference_data.fasta             --output-insertions results/insertions_reference_data.tsv > logs/align_reference_data.txt 2>&1;
    xz -2 -T 8 results/aligned_reference_data.fasta;
    xz -2 -T 8 results/translations/seqs_reference_data.gene.*.fasta
    
    (one of the commands exited with non-zero exit code; note that snakemake uses bash strict mode!)

Removing output files of failed job align since they might be corrupted:
data.nextstrain.org/files/ncov/open/reference/sequences.fasta.xz
Shutting down, this might take some time.
Exiting because a job execution failed. Look above for error message
Complete log: /nextstrain/build/.snakemake/log/2024-05-30T044613.535304.snakemake.log

Expected behavior
The flow should complete with no errors for the example data.

How to reproduce
Steps to reproduce the current behavior:

Possible solution

Your environment: if browsing Nextstrain online

  • Operating system: -
  • Browser:-

Your environment: if running Nextstrain locally

  • Operating system:22.04.4 LTS
  • Browser: Chrome
  • Version (e.g. auspice 2.7.0): Nextstrain-cli 8.4.0

Additional context

Thanks a lot.

Best regards,
Eddie

Hello @llk578496,

Looks like the runtime you are using for the build does not include the nextclade2 command.

Could you run

nextstrain version --verbose

and attach the output so we can check which your runtime?

Hello @joverlee521,

Please find the output as below.

(base) gilman_siu2@gilmansiu2-Z490-VISION-D:/mnt/data6/COVID/phylogenetic_trees/20230213_latest_local_paper_2022_whole_year/nextstrain-cli-v8.2.0/example-data/ncov$ nextstrain version --verbose
nextstrain.cli 8.4.0

Python
/home/gilman_siu2/.nextstrain/cli-standalone/nextstrain
3.10.9 (main, Dec 21 2022, 04:02:04) [Clang 14.0.3 ]

Runners
docker (default)
nextstrain/base:build-20220523T233129Z (d34d7eab0283, 2022-05-24 07:45:58 +0800 HKT)
augur 15.0.2
auspice v2.37.1
fauna d7e8eb2
sacra not present

conda
nextstrain-base unknown

singularity
docker://nextstrain/base (not present)

ambient
unknown

aws-batch
unknown

Thanks @llk578496! I see that the docker image that you currently using (nextstrain/base:build-20220523T233129Z) is an older version that does not include the nextclade2 command.

You can update your runtime to the latest available version by running:

nextstrain update

It works now! Thank you very much!