DLBPointon / rapid-pretext-script

A re-write of the original rapid-pretext script.

Geek Repo:Geek Repo

Github PK Tool:Github PK Tool

Rapid-Pretext


Originally written by Alan Tracey and Re-written by Damon-Lee B Pointon.

This script takes an AGP file generated by Pretext (containing information on the positions of scaffolds, and where they have been moved too) and cross references with a TPF file (containing information on the original assembled genome such as length of component).

The output is a series of TPF files, based on the original TPF and changed by the AGP information.

Due to the nature of Pretext this is not exact and requires some fuzzy logic in order to map TPF component bp to AGP component bp.


The original script needs updating for the following:

- To update the naming scheme, only "painted" scaffolds should be named as chromosomal units.

- With the increase in the number of polyploid organisms coming through the pipeline, it would be useful to output each named haplotype to a seperate file.

TODO:

  • Sort based on Original AGP.
  • Split df into hap-based df's and output to tsv.
  • Generate a single column csv of chromosomal components.
  • ensure sex chromosome is scaff name.

About

A re-write of the original rapid-pretext script.


Languages

Language:Python 55.1%Language:Jupyter Notebook 44.9%