ChaissonLab / danbing-tk

Toolkit for VNTR genotyping and repeat-pan genome graph construction

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Boundary expanded VNTRs

dbeyter opened this issue · comments

Dear ChaissonLab,

I was curious if the boundary expanded VNTRs are made available (or can be made available) as described in page 21 of your preprint https://www.biorxiv.org/content/10.1101/2020.08.13.249839v3.full.pdf

If not, can you pinpoint on how to perform this expansion on an assembly or hg38?

Thank you.
Doruk

Hi @dbeyter,

If you would like to identify the VNTR boundaries with your own assemblies, it's doable by running the danbing-tk build with an additional option --until JointTRAnnotation when invoking snakemake. This will skip steps to generate RPGG. Let me know if any of the documentation is unclear.

Thanks,
-Tony

Hi Mark ,
It would be nice to have the intervals on hg38 initially, but I would also be happy for the intervals you can provide on assemblies that are not under an embargo. Incorporating our own assemblies would be interesting, but I am curious to start with available intervals as an initial experiment!

Hi Tony,
Thank you for pointing out how. I will look into it. Your approach seem to use the intervals from the assemblies, but is there a default set of intervals you provide?

Best,
Doruk

Hi Doruk,

I've now added assemblies that we can release at this moment and their VNTR coordinates under v1.0. Hope this helps!

-Tony

Hi Tony,

Thank you so much for adding the assemblies and their VNTR coordinates. I have already seen one case I am aware of where a VNTR interval split into 2 regions by Tandem Repeats Finder is nicely merged into one in your tr.good.bed file.

A few questions about the data: I am understanding that tr.good.bed file is a step by step filtering of the 84,411 loci down to 73,582 and then to 32,138.

Question 1) Can we also reach the 84,411 and 73,582 set of loci as two separate bed files?
Question 2) Can we run danbing-tk build using the option --until JointTRAnnotation and use hg38.fa reference genome as my assembly to generate its set of expanded VNTRs?

Best,
Doruk

Hi Doruk,

Thanks for asking, glad to know the initial sets could be useful to others. I've now included the two sets under v1.0 as well.

And yes, annotations on hg38 can be done by running the pipeline/RefGraph.snakefile pipeline with the option you mentioned.

Let me know if you have any problems running the pipeline. Thanks!

Best,
-Tony

Hi Tony,

Thank you for including the unfiltered TR coordinates, and answering my questions!

Of course -- will do so!

Best,
Doruk