Not detecting mutations which can be seen in IGV

Question

Not detecting mutations which can be seen in IGV

mbdabrowska1 opened this issue 2 months ago · comments

Hi, I have an issue with ClusterV sometimes not detecting mutations that can be seen in IGV and are present at a relatively high allele frequency. In the following examples you can see that the mutation is clearly visible in IGV, but in the ClusterV report it doesn't seem to be called:

BARCODE19

RT:V106I mutation expected (GTA -> ATA at nucleotide 2411, NC_001802.1 reference)

And the corresponding report:

The coverage around the region isn't very high. Could this be causing the issue? I feel like it should still be seen in consensus unless I'm misunderstanding how Flye assembles the fragment.

BARCODE25

RT:E138A mutation expected (GAG -> GCG mutation at nucleotide 2508, NC_001802.1 reference)

Corresponding report:

And coverage:

BARCODE43 - separate run

RT:V106I mutation expected (GTA -> ATA at nucleotide 2411, NC_001802.1 reference):

Report:

Coverage:

Any help with this would be greatly appreciated! Please let me know if you require the original files as I'm happy to share those via email.

Junhao · Answer 1 · Tue May 14 2024 19:58:35 GMT+0800 (China Standard Time)

Hi,
The missing variants with high depth from output may have multiple causes.

It may be from (1) the missing calling from the variant caller, Clair-Ensemble model trained at Guppy5 data in ClusterV; (2) the read with variants are filtered, the original bam filtering reads with large indel are filtered in ClusterV, and the filtering process may filter read with your mentioned variants. the filtered file is in [YOUR INPUT FILE NAME]_f.bam.

For issue (2), adjusting the filtering setting in --indel_l may solve the issue.
For issue (1), we have extensively tested ClusterV to avoid this situation happening, however, when using data in different chemistry or from different basecalling from ONT data, the problem may exist. In this case, we need time and effort to evaluate and further adjust our variant calling model.

In case the adjustment of filtering does not solve the problem, Could you please share your files with me for further testing on my side? You can send it to my email, jhsu@connect.hku.hk, if needed.

Regards,
JH