fritzsedlazeck / SURVIVOR

Toolset for SV simulation, comparison and filtering

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Merging UKB SV Files

GHawkes93 opened this issue · comments

Hi,

In the recent release of 500,000 genomes, the UKB has provided SV calls, but only in bgzipped sample-level vcf files.

I've tried merging these files in groups to create a pVCF- after unzipping each vcf, as survivor doesn't seem to take .gz files? - but the file size is growing such that I can't merge those groups (I get a "Killed" error). I tried trimming the vcf files to just genotypes in the FORMAT field using bcftools - but then the merging was odd, in that when merging two files with 9000 people each in, I got only 2 individuals in the output

Do you have any suggestions for how I could perform this analysis?

Cheers,
Gareth

I should add - I'm using a 72-core machine - each group file (approx 9k people) is ~ 270GB and contains ~.5M SVs