igvteam / igv-reports

Python application to generate self-contained pages embedding IGV visualizations, with no dependency on original input files.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Large structural variants not being fully captured in report tracks

zbonnstetter opened this issue · comments

I noticed that the flanking range has only been centered around variant Start positions, and not variant End positions as well. This leads to particularly large variants getting cut off and not fully captured, with no way to remedy it outside of setting the flanking range unnecessarily high for every variant in a report, or manually examining every variant and regenerating reports for ones that have not been fully captured in the tracks.

Here is an example of a deletion that at the default 1kb flanking range has barely captured at all before becoming unintelligible due to extending beyond the flanking range
Screenshot 2023-12-20 at 10 07 57 AM

I would love to see (or if this functionality exists and I have simply missed it, know about) a way to ensure that every variant, regardless of size, is fully captured by default and that the flanking range applies to both the Start and End of the variant instead of solely the start.

Could you clarify what you mean by "End" position wrt to the VCF record? AFAIK this has not been standardized for SVs, although there are some conventions out there. An example VCF record line would help.

Technically, in a VCF record, only the start position is recorded. The end can sometimes be inferred from the alt allele, and sometimes from info tags.

Thank you for your swift response; sure, I've attached a truncated example VCF record below. As you mentioned, the end position is recorded in the info tags in this instance. If you have any advice on potentially leveraging the presence of an END info tag it would be much appreciated.

chr2 206135424 pbsv.DEL <Deleted_Sequence> C . PASS SVTYPE=DEL;END=206137035;SVLEN=-1611

I'm looking at this now. A caution about "large" structural variants, if they are too large the resulting report will not be loadable in a web browser because the sequence will swell the size of the html file beyond the loadable limit. Its impossible to say precisely, or even approximately, what the limit is, but its not infinite. A single variant of 1 MB is likely to translate to 100s of kb in file size.

I think we need to set a limit, and switch to a 2 locus view if the SV length exceeds that limit.

This is fixed with release 1.10.0. Also, I have added a new parameter maxlen which is the maximum length of a variant to display in a single view. This defaults to 10000 (10 kb). If a variant exceeds this length it will be displayed in split screen view. This is important to keep the size of the resulting html reasonable.