[v3] error attaching nodes to tree

Question

[v3] error attaching nodes to tree

jameshadfield opened this issue a year ago · comments

@ivan-aksamentov I'm sure you're aware of these so feel free to close if it's on your radar to fix

Nextclade (v3, built from 3653a04) when running HBV data, including the provided test sequences, raises the following error:

./target/release/nextclade run -j 1 --input-dataset test_datasets/hbv/files \
  --output-all <dir> --output-basename v3 \
  test_datasets/hbv/files/sequences.fasta

Error: 
   0: When attaching the new node for query sequence 'OP153999.1 |Hepatitis B virus isolate HBV_OBI_UFD1193, complete genome' to the tree
   1: Parent node is expected, but not found. This is an internal error. Please report it to developers

Location:
   packages_rs/nextclade/src/tree/tree_builder.rs:191

You can see this in the web UI too

I wondered whether this was due to using a dummy tree.json in that (test) dataset. I created a proper tree (will share in slack) and swapped the dataset to that. This resulted in a different error:

Error: 
   0: When attaching the new node for query sequence 'OP255998.1 |Hepatitis B virus isolate HBV_OBI_SPb274, complete genome' to the tree
   1: When splitting mutations between query sequence and the child node 'NODE_0001450'
   2: When splitting private nucleotide substitutions
   3: Found mutations with the same position, but different reference letters: C2740A and T2740C. This is an internal error. Please report it to developers

Location:
   packages_rs/nextclade/src/tree/split_muts.rs:118

In both cases the alignments, translations + metadata are written out before the program exits code 1

Ivan Aksamentov · Answer 1 · Thu Jul 20 2023 13:59:32 GMT+0800 (China Standard Time)

@jameshadfield Thanks! Not aware. So this is very valuable.

The first error is reproducible. We will check it with Richard.

Regarding the second error, could you please share the tree (or better full dataset) and the sequence in question, so that we can reproduce and trace the execution?

P.S. This may or may not be related and may or may not be useful information for your work on HBV. Back when working on genome annotation branch, I modified the genemap.gff of this dataset: test_datasets/hbv/files/genemap.gff#L6-L16, so that it has proper gene entries as separate lines. In the original they don't exist and genes are just marked as "gene" attributes on CDSes, which is against GFF3 spec. I haven't changed boundaries or anything else (at least I did not mean to). It is important that all three: reference sequence, reference tree and gene map correspond to each other precisely. Any discrepancies can cause errors or just random pink elephants.

Ivan Aksamentov · Answer 2 · Fri Jul 21 2023 05:42:26 GMT+0800 (China Standard Time)

I merged #1208, which should solve the first error.

Ivan Aksamentov · Answer 3 · Fri Jul 21 2023 16:18:26 GMT+0800 (China Standard Time)

The second part should be addressed in #1211