ChromMatch

A tool for assigning chromosome labels based on a reference genome. This method is intended to be more sensitive than whole-genome alignment.

Dependencies

Python 3
gffread
minimap2
RagTag
numpy
pysam
networkx
gffutils

Installation

No installation is needed. Just install the above dependencies and run python3 chrom_match.py

Usage

Suppose there is a "target" genome assembly with 12 chromosome-scale sequences and the goal is to assign chromosome names based on a related reference genome, also with 12 chromsomes.

python3 chrom_match.py target.fa reference.fa reference.genes.gff3

Suppose that the target and reference assemblies have additional sequences, such as unplaced contigs/scaffolds. Then, supply the sequences to be matched with -t and -r.

In all cases, the number of target and reference sequences must be the same in order to be matched.

Pipeline

Write reference transcripts to a FASTA file
Align these transcripts to the target assembly with minimap2
Process these alignments to build a bipartite graph
- For each gene, only consider its longest (representative) transcript
- Only consider representative transcripts if they align with mapq > 10 and coverage >= 85%
- Target sequences make one set of nodes
- Reference sequences make the other set of nodes
- Edges connecting these nodes have their weight decremented by one for each shared transcript
Compute a minimum weight full matching for the graph
Output the matching solution in AGP format

malonge / ChromMatch

ChromMatch

Dependencies

Installation

Usage

Pipeline

About

Languages