nf-core / raredisease

Call and score variants from WGS/WES of rare disease patients.

Home Page:https://nf-co.re/raredisease

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Filter reference segments from CNV caller output

Jakob37 opened this issue · comments

Description of the bug

The CNV caller output segments contains, in addition to calls of duplications / deletions, the segments in between those. For these the allele is set to "0/0" and not marked deletion <DEL> or duplication <DUP> in the ALT column.

I suspect this to be an error - we don't want to continue processing the ranges between the actual call ranges. They should probably be filtered out before being merged with the outputs of the other SV-callers.

An example output is seen below. In short - my guess is that we want to remove all where the ALT column is . rather than <DEL> or <DUP>.

#CHROM  POS     ID      REF     ALT     QUAL    FILTER  INFO    FORMAT  giab_sample
chr1    792501  CNV_chr1_792501_1632500 N       .       3076.53 .       END=1632500     GT:CN:NP:QA:QS:QSE:QSS  0/0:2:825:9:3077:12:123
chr1    1632501 CNV_chr1_1632501_1635500        N       <DEL>   29.40   .       END=1635500     GT:CN:NP:QA:QS:QSE:QSS  0/1:1:3:11:29:16:12
chr1    1635501 CNV_chr1_1635501_1709500        N       .       1829.22 .       END=1709500     GT:CN:NP:QA:QS:QSE:QSS  0/0:2:72:4:1829:3:16
chr1    1710501 CNV_chr1_1710501_1712500        N       <DEL>   18.93   .       END=1712500     GT:CN:NP:QA:QS:QSE:QSS  0/1:1:2:9:19:37:3
chr1    1712501 CNV_chr1_1712501_1714500        N       <DEL>   422.43  .       END=1714500     GT:CN:NP:QA:QS:QSE:QSS  1/1:0:2:221:422:239:185
chr1    1714501 CNV_chr1_1714501_2121500        N       .       3076.53 .       END=2121500     GT:CN:NP:QA:QS:QSE:QSS  0/0:2:402:2:3077:26:34
chr1    2121501 CNV_chr1_2121501_2124500        N       <DEL>   27.26   .       END=2124500     GT:CN:NP:QA:QS:QSE:QSS  0/1:1:3:21:27:22:26
chr1    2124501 CNV_chr1_2124501_2651500        N       .       3076.53 .       END=2651500     GT:CN:NP:QA:QS:QSE:QSS  0/0:2:514:63:3077:53:22
chr1    2656501 CNV_chr1_2656501_2672500        N       <DEL>   94.12   .       END=2672500     GT:CN:NP:QA:QS:QSE:QSS  0/1:1:4:19:94:18:53
chr1    2674501 CNV_chr1_2674501_2675500        N       <DUP>   2.49    .       END=2675500     GT:CN:NP:QA:QS:QSE:QSS  ./.:3:1:2:2:2:2
chr1    2677501 CNV_chr1_2677501_2683500        N       .       32.24   .       END=2683500     GT:CN:NP:QA:QS:QSE:QSS  0/0:2:3:10:32:18:8

Command used and terminal output

No response

Relevant files

No response

System information

No response

yeah this looks wrong. I'll look into it

Is this from the gatk or cnvnator?

This is from GATK

I can try putting together a PR for this issue and the other remaining GATK issues (#442 and #444), if not someone else already is looking into it!

Sounds great @Jakob37 ❤️
@ramprasadn might be working on #442, so perhaps start on #444 until we can confirm with him