SURVIVOR simSV - number of SVs doesn't correspond to parmas file
ethering opened this issue · comments
Hi,
I'm running SURVIVOR v1.0.7 and. I've noticed that the number of SVs events generated by SURVIVOR simSV
(in both the .bed and .vcf files) does not correspond to the .parmams file and differs depending on the value of option 3 (0 or 1).
Here's what I see:
$ SURVIVOR simSV test.param
Output:
PARAMETER FILE: DO JUST MODIFY THE VALUES AND KEEP THE SPACES!
DUPLICATION_minimum_length: 100
DUPLICATION_maximum_length: 10000
DUPLICATION_number: 3
INDEL_minimum_length: 20
INDEL_maximum_length: 500
INDEL_number: 1
TRANSLOCATION_minimum_length: 1000
TRANSLOCATION_maximum_length: 3000
TRANSLOCATION_number: 2
INVERSION_minimum_length: 600
INVERSION_maximum_length: 800
INVERSION_number: 4
INV_del_minimum_length: 600
INV_del_maximum_length: 800
INV_del_number: 2
INV_dup_minimum_length: 600
INV_dup_maximum_length: 800
INV_dup_number: 2
Then using SURVIVOR simSV
to generate SVs:
Using option 3 = 1, I see the correct number of everything, other than zero DUP (I presume for inversions, INV_del_number
+ INV_dup_number
= INVERSION_number
). Also, the DUP value is always zero in the true positives and false negatives section of SURVIVOR eval
.
$ SURVIVOR simSV reference.fasta test.param 0.1 1 test1_sv
$ cat test1_sv.bed
Mt 1098 Mt 1819 INV
Mt 17423 Mt 18216 INV
Chr2 800538 Chr2 800924 INS
Mt 51828 Chr3 1034461 TRA
Mt 54161 Chr3 1036794 TRA
Chr1 1406312 Chr1 1407023 INV
Chr1 2541684 Chr1 2542421 INV
Chr1 1740043 Chr3 3282514 TRA
Chr1 1741044 Chr3 3283515 TRA
Using option 3 = 0, I see the following (ordered by SV-type for ease):
5 Duplication events, not 3
5 INDELS (1 INS and 4 DEL), not 1
8 INVERSIONS, not 4
$ SURVIVOR simSV reference.fasta test.param 0.1 0 test0_sv
$ cat test0_sv.bed
Chr3 1671702 Chr3 1679825 DUP
Chr3 3600129 Chr3 3604236 DUP
Chr3 725731 Chr3 727808 DUP
Chr2 281472 Chr2 282151 DUP
Mt 55970 Mt 56657 DUP
Mt 43737 Mt 43991 INS
Chr2 2719697 Chr2 2719765 DEL
Chr2 2720309 Chr2 2720377 DEL
Chr2 1496557 Chr2 1496622 DEL
Chr2 1497150 Chr2 1497215 DEL
Chr2 721379 Chr3 1055120 TRA
Chr2 722729 Chr3 1056470 TRA
Chr2 4982397 Mt 21418 TRA
Chr2 4985041 Mt 24062 TRA
Mt 36164 Mt 36770 INV
Chr1 3880402 Chr1 3881102 INV
Chr3 3485167 Chr3 3485931 INV
Chr2 353814 Chr2 354459 INV
Chr2 2719765 Chr2 2720309 INV
Chr2 1496622 Chr2 1497150 INV
Mt 55970 Mt 56657 INV
Chr2 281472 Chr2 282151 INV
Can you comment on this? I've never really understood why SURVIVOR generates different data depending on what the downstream use of it will be (SVs in reference, or SVs in reads). But what is obvious here is that it appears to be generating a different number of SVs than requested in the params file.
Cheers,
Graham