[v3] error attaching nodes to tree
jameshadfield opened this issue · comments
@ivan-aksamentov I'm sure you're aware of these so feel free to close if it's on your radar to fix
Nextclade (v3, built from 3653a04) when running HBV data, including the provided test sequences, raises the following error:
./target/release/nextclade run -j 1 --input-dataset test_datasets/hbv/files \
--output-all <dir> --output-basename v3 \
test_datasets/hbv/files/sequences.fasta
Error:
0: When attaching the new node for query sequence 'OP153999.1 |Hepatitis B virus isolate HBV_OBI_UFD1193, complete genome' to the tree
1: Parent node is expected, but not found. This is an internal error. Please report it to developers
Location:
packages_rs/nextclade/src/tree/tree_builder.rs:191
You can see this in the web UI too
I wondered whether this was due to using a dummy tree.json
in that (test) dataset. I created a proper tree (will share in slack) and swapped the dataset to that. This resulted in a different error:
Error:
0: When attaching the new node for query sequence 'OP255998.1 |Hepatitis B virus isolate HBV_OBI_SPb274, complete genome' to the tree
1: When splitting mutations between query sequence and the child node 'NODE_0001450'
2: When splitting private nucleotide substitutions
3: Found mutations with the same position, but different reference letters: C2740A and T2740C. This is an internal error. Please report it to developers
Location:
packages_rs/nextclade/src/tree/split_muts.rs:118
In both cases the alignments, translations + metadata are written out before the program exits code 1
@jameshadfield Thanks! Not aware. So this is very valuable.
The first error is reproducible. We will check it with Richard.
Regarding the second error, could you please share the tree (or better full dataset) and the sequence in question, so that we can reproduce and trace the execution?
P.S. This may or may not be related and may or may not be useful information for your work on HBV. Back when working on genome annotation branch, I modified the genemap.gff of this dataset: test_datasets/hbv/files/genemap.gff#L6-L16, so that it has proper gene entries as separate lines. In the original they don't exist and genes are just marked as "gene" attributes on CDSes, which is against GFF3 spec. I haven't changed boundaries or anything else (at least I did not mean to). It is important that all three: reference sequence, reference tree and gene map correspond to each other precisely. Any discrepancies can cause errors or just random pink elephants.
I merged #1208, which should solve the first error.
The second part should be addressed in #1211