yaacoo / VCF-indel-converter

Insertion/deletion notation converter from "-" notation to proper VCF (including leading base),

Repository from Github https://github.comyaacoo/VCF-indel-converterRepository from Github https://github.comyaacoo/VCF-indel-converter

VCF indel notation converter from (A -> "-") to (GA -> G) including leading base (proper VCF notation).

For info please contact me: or.yaacov@mail.huji.ac.il

Optional: a more advanced script in this repo, dbSNPmatchindels.r, matches the indels with dbSNP.
Dependencies:

R (3+), Bioconductor (packages: BSgenomem, BSgenome.Hsapiens.UCSC.hg19, Biostrings)

Takes a 5 col tsv file (chr, pos, name, ref, alt):

chr1 20996757 NULL T -
chr1 20996257 NULL TT -
chr1 20996457 NULL - TT
chr1 20996457 NULL - T
chr1 20996457 NULL A G

Converts to:

chr1 20996756 NULL AT A
chr1 20996256 NULL GTT G
chr1 20996456 NULL A ATT
chr1 20996456 NULL A AT
chr1 20996457 NULL A G

Limitation: The script assumes "true" alternative and reference alleles (proper VCF), and not major/minor as in some files generated by PLINK, since in the case of major/minor it is usually impossible to distinguish a deletion from an insertion.

About

Insertion/deletion notation converter from "-" notation to proper VCF (including leading base),


Languages

Language:Jupyter Notebook 75.3%Language:R 24.7%