群体多个个体vcf合并

Question

群体多个个体vcf合并

liweirfI opened this issue 4 months ago · comments

您好！基于图形泛基因组，采用vg软件call程序，获得了2000多个个体vcf文件需要合并，对于survivor软件，使用./SURVIVOR merge sample_files 1000 2 1 1 0 30 sample_merged.vcf；对于panpop软件，请问是否使用./subworkflows/mergeSV3_pop.py脚本合并就可以？不需要额外执行./bin/PART_run.pl，对吗？

StarSkyZheng · Answer 1 · Tue May 14 2024 21:54:43 GMT+0800 (China Standard Time)

subworkflows/mergeSV3_pop.py是snakemake脚本，无法直接运行。
只能使用PART_run.pl

liweirfI · Answer 2 · Wed May 15 2024 10:58:36 GMT+0800 (China Standard Time)

非常感谢你的回复！
群体SV合并流程：（1）采用vg软件call程序获得每个个体vcf文件，保留SV长度大于30 bp的vcf文件（较少服务器运算量，关注SV），采用bcftools merge -m none进行合并群体vcf文件；（2）参考https://doi.org/10.24433/CO.1577027.v1 中run流程，采用PART_run.pl脚本，对合并后vcf文件进行处理。（3）采用./scripts/vcf_split_snp_indel_sv.pl获得SV vcf文件，采用beagle软件对缺失SV进行填补。
存在问题：（1）针对图形泛基因组，PART_run.pl脚本中--ref_fasta_file选项，我使用骨架基因组（参考基因组），是否可以？（2）请问采用beagle软件填补缺失SV，是否可行？由于gam文件已删除（文件太大），故不能采用Fill_DP.pl脚本进行缺失值填补。之前采用过survivor软件进行群体SV合并，发现群体水平基因分型率（Total genotyping rate）较低（0.38)；现在采用此流程，合并后SV数量两者差不多，panpop基因分型率为0.62，比survivor提升63%，但panpop多等位基因SV数量偏多，比survivor多5%以上，主要是三等位基因偏多且基本上含有一个缺失等位基因“*”。（3）对于此群体SV合并流程，请问是否其他问题？请帮忙指出

StarSkyZheng · Answer 3 · Wed May 15 2024 16:33:13 GMT+0800 (China Standard Time)

需要vcf中的染色体在fasta中都存在就行
理论上可行
如果多等位基因SV较多，可以多运行几次PART_run.pl。理论上多等位基因SV会逐次减少

github-actions · Answer 4 · Sat Jun 15 2024 10:09:43 GMT+0800 (China Standard Time)

This issue is stale because it has been open for 30 days with no activity.

github-actions · Answer 5 · Sun Jun 30 2024 10:21:45 GMT+0800 (China Standard Time)

This issue was closed because it has been inactive for 14 days since being marked as stale.